krr-oxford / deeponto Goto Github PK

View Code? Open in Web Editor NEW

160.0 5.0 11.0 165.36 MB

A package for ontology engineering with deep learning and language models.

Home Page: https://krr-oxford.github.io/DeepOnto/

License: Apache License 2.0

Python 82.81% Batchfile 0.03% Shell 0.08% Groovy 0.06% Java 0.53% Scala 16.48%

ontologies ontology-engineering owlapi deep-learning language-model

deeponto's People

Contributors

Stargazers

Watchers

Forkers

drfarahmohamed valentijnvenus eltociear christinak97 f901107 danysan1 zahidabasher jingcshi ksoumya sabayossi minyoungci

deeponto's Issues

Disable INFO level message of ELK reasoner.

Is your feature request related to a problem? Please describe.
Using the ELK reasoner prints a lot of console message.

Describe the solution you'd like
Disable INFO level message.

Consistency checking

Dear all,

thank you for DeepOnto.

I was wondering whether there is an example code for consistency checking, e.g.

from deeponto.onto import Ontology
onto = Ontology("path_to_ontology.owl", "hermit")
assert onto.consistent()

embedding ontology

how can find embedding ontology for own dataset?

Tokenizer error "list index out of range" during mapping extension

Describe the bug
Under some circumstances dureing the mapping extensions stage the tokenizer throws the error IndexError: list index out of range.
The error originates at bert_classifier.py line 185.
This is the same error and same location inside the tokenizer of huggingface/tokenizers#993 , which was caused by the data passed to the tokenizer.

To Reproduce
I have reproduced this error with these settings:

Logs & stack trace	`max_length_for_input`	`batch_size_for_training`	Source ontology	Target ontology
link	256	16	music-representation.owl	musicClasses.owl @ 2ebb641
link	128	8	core.owl	musicClasses.owl @ ebc2d09

Expected behavior
The stage and the pipeline should complete successfully

Platform:

OS: python notebook on Google Colab
Python 3.10
Transformers 4.30.2
DeepOnto 0.8.3

Please fix the version of transformer library

I have tried to run BERTMap, but got the following error:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper__index_select)

In fact, it is a bug that was introduced in Transformer 4.12.3 and has been fixed in 4.13.0. For short, the output of Tokenizer is BatchEncoding, but the Trainer only transfers Union[torch.Tensor, Tuple, List, Dictionary] to GPU.
(Please refer to this link for more details When running the Trainer cell, it found two devices (cuda:0 and CPU))

I think this bug is introduced in this commit 086a25cae945d496765cbbb09b36f9780d676ac7. Please consider fixing the version of Transformer.

Repeated error message regarding BERT's configuration while using BertMap with custom ontologies

This is Shriram and I mailed you recently regarding my interest in making use of DeepOnto, I am currently using 2 different autonomous vehicles ontology and am unable to run the BertMap model due to "ValueError: evaluation strategy steps requires either non-zero --eval_steps or --logging_steps". I am unaware as to where this error is arising from.

/usr/local/lib/python3.10/dist-packages/transformers/training_args.py in post_init(self)
1301 self.eval_steps = self.logging_steps
1302 else:
-> 1303 raise ValueError(
1304 f"evaluation strategy {self.evaluation_strategy} requires either non-zero --eval_steps or"
1305 " --logging_steps"

ValueError: evaluation strategy steps requires either non-zero --eval_steps or --logging_steps

this is the entire error I am getting,
Could the number of instances in my ontology be any reason for this error? I even tried multiple value changes to my config yaml file, none of them work. Kindly help me with the same.

Thanks in advance!

division by zero error in AnnotationThesaurus

Describe the bug
While running BERTMap I'm receiving an error "ZeroDivisionError: division by zero"

To Reproduce
Launch BERTMap with these input files:
- configuration: bertmap.yaml
- source ontology: ontology-network.ttl
- target ontology: music.owl

Expected behavior
Mapping search between the ontologies should work normally

Actual output

[Time: 00:18:47] - [PID: 172] - [Model: bertmap] 
Load the following configurations:
{
    "model": "bertmap",
    "output_path": "/content",
    "annotation_property_iris": [
        "http://www.w3.org/2000/01/rdf-schema#label",
        "http://www.geneontology.org/formats/oboInOwl#hasSynonym",
        "http://www.geneontology.org/formats/oboInOwl#hasExactSynonym",
        "http://www.w3.org/2004/02/skos/core#exactMatch",
        "http://www.ebi.ac.uk/efo/alternative_term",
        "http://www.orpha.net/ORDO/Orphanet_#symbol",
        "http://purl.org/sig/ont/fma/synonym",
        "http://www.w3.org/2004/02/skos/core#prefLabel",
        "http://www.w3.org/2004/02/skos/core#altLabel",
        "http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#P108",
        "http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#P90"
    ],
    "known_mappings": null,
    "auxiliary_ontos": [],
    "bert": {
        "pretrained_path": "bert-base-uncased",
        "max_length_for_input": 128,
        "num_epochs_for_training": 3.0,
        "batch_size_for_training": 16,
        "batch_size_for_prediction": 128,
        "resume_training": null
    },
    "global_matching": {
        "enabled": true,
        "num_raw_candidates": 200,
        "num_best_predictions": 10,
        "mapping_extension_threshold": 0.8,
        "mapping_filtered_threshold": 0.9
    }
}
[Time: 00:00:00] - [PID: 172] - [Model: bertmap] 
Load the following configurations:
{
    "model": "bertmap",
    "output_path": "/content",
    "annotation_property_iris": [
        "http://www.w3.org/2000/01/rdf-schema#label",
        "http://www.geneontology.org/formats/oboInOwl#hasSynonym",
        "http://www.geneontology.org/formats/oboInOwl#hasExactSynonym",
        "http://www.w3.org/2004/02/skos/core#exactMatch",
        "http://www.ebi.ac.uk/efo/alternative_term",
        "http://www.orpha.net/ORDO/Orphanet_#symbol",
        "http://purl.org/sig/ont/fma/synonym",
        "http://www.w3.org/2004/02/skos/core#prefLabel",
        "http://www.w3.org/2004/02/skos/core#altLabel",
        "http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#P108",
        "http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#P90"
    ],
    "known_mappings": null,
    "auxiliary_ontos": [],
    "bert": {
        "pretrained_path": "bert-base-uncased",
        "max_length_for_input": 128,
        "num_epochs_for_training": 3.0,
        "batch_size_for_training": 16,
        "batch_size_for_prediction": 128,
        "resume_training": null
    },
    "global_matching": {
        "enabled": true,
        "num_raw_candidates": 200,
        "num_best_predictions": 10,
        "mapping_extension_threshold": 0.8,
        "mapping_filtered_threshold": 0.9
    }
}
INFO:bertmap:Load the following configurations:
{
    "model": "bertmap",
    "output_path": "/content",
    "annotation_property_iris": [
        "http://www.w3.org/2000/01/rdf-schema#label",
        "http://www.geneontology.org/formats/oboInOwl#hasSynonym",
        "http://www.geneontology.org/formats/oboInOwl#hasExactSynonym",
        "http://www.w3.org/2004/02/skos/core#exactMatch",
        "http://www.ebi.ac.uk/efo/alternative_term",
        "http://www.orpha.net/ORDO/Orphanet_#symbol",
        "http://purl.org/sig/ont/fma/synonym",
        "http://www.w3.org/2004/02/skos/core#prefLabel",
        "http://www.w3.org/2004/02/skos/core#altLabel",
        "http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#P108",
        "http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#P90"
    ],
    "known_mappings": null,
    "auxiliary_ontos": [],
    "bert": {
        "pretrained_path": "bert-base-uncased",
        "max_length_for_input": 128,
        "num_epochs_for_training": 3.0,
        "batch_size_for_training": 16,
        "batch_size_for_prediction": 128,
        "resume_training": null
    },
    "global_matching": {
        "enabled": true,
        "num_raw_candidates": 200,
        "num_best_predictions": 10,
        "mapping_extension_threshold": 0.8,
        "mapping_filtered_threshold": 0.9
    }
}
[Time: 00:18:47] - [PID: 172] - [Model: bertmap] 
Save the configuration file at /content/bertmap/config.yaml.
[Time: 00:00:00] - [PID: 172] - [Model: bertmap] 
Save the configuration file at /content/bertmap/config.yaml.
INFO:bertmap:Save the configuration file at /content/bertmap/config.yaml.
[Time: 00:18:47] - [PID: 172] - [Model: bertmap] 
Construct new text semantics corpora and save at /content/bertmap/data/text-semantics.corpora.json.
[Time: 00:00:00] - [PID: 172] - [Model: bertmap] 
Construct new text semantics corpora and save at /content/bertmap/data/text-semantics.corpora.json.
INFO:bertmap:Construct new text semantics corpora and save at /content/bertmap/data/text-semantics.corpora.json.
---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
[<ipython-input-12-a888744a31b2>](https://localhost:8080/#) in <cell line: 1>()
----> 1 bertmap = BERTMapPipeline(src_onto, tgt_onto, config)

6 frames
[/usr/local/lib/python3.10/dist-packages/deeponto/align/bertmap/pipeline.py](https://localhost:8080/#) in __init__(self, src_onto, tgt_onto, config)
    119         # load or construct the corpora
    120         self.corpora_path = os.path.join(self.data_path, "text-semantics.corpora.json")
--> 121         self.corpora = self.load_text_semantics_corpora()
    122 
    123         # load or construct fine-tune data

[/usr/local/lib/python3.10/dist-packages/deeponto/align/bertmap/pipeline.py](https://localhost:8080/#) in load_text_semantics_corpora(self)
    251                 corpora.save(self.data_path)
    252 
--> 253             return self.load_or_construct(self.corpora_path, data_name, construct)
    254 
    255         self.logger.info(f"No training needed; skip the construction of {data_name}.")

[/usr/local/lib/python3.10/dist-packages/deeponto/align/bertmap/pipeline.py](https://localhost:8080/#) in load_or_construct(self, data_file, data_name, construct_func, *args, **kwargs)
    227         else:
    228             self.logger.info(f"Construct new {data_name} and save at {data_file}.")
--> 229             construct_func(*args, **kwargs)
    230         # load the data file that is supposed to be saved locally
    231         return FileUtils.load_file(data_file)

[/usr/local/lib/python3.10/dist-packages/deeponto/align/bertmap/pipeline.py](https://localhost:8080/#) in construct()
    241 
    242             def construct():
--> 243                 corpora = TextSemanticsCorpora(
    244                     src_onto=self.src_onto,
    245                     tgt_onto=self.tgt_onto,

[/usr/local/lib/python3.10/dist-packages/deeponto/align/bertmap/text_semantics.py](https://localhost:8080/#) in __init__(self, src_onto, tgt_onto, annotation_property_iris, class_mappings, auxiliary_ontos)
    517         # build intra-ontology corpora
    518         # negative sample ratios are by default
--> 519         self.intra_src_onto_corpus = IntraOntologyTextSemanticsCorpus(src_onto, annotation_property_iris)
    520         self.add_samples_from_sub_corpus(self.intra_src_onto_corpus)
    521         self.intra_tgt_onto_corpus = IntraOntologyTextSemanticsCorpus(tgt_onto, annotation_property_iris)

[/usr/local/lib/python3.10/dist-packages/deeponto/align/bertmap/text_semantics.py](https://localhost:8080/#) in __init__(self, onto, annotation_property_iris, soft_negative_ratio, hard_negative_ratio)
    310         self.onto = onto
    311         # $\textsf{BERTMap}$ does not apply synonym transitivity
--> 312         self.thesaurus = AnnotationThesaurus(onto, annotation_property_iris, apply_transitivity=False)
    313 
    314         self.synonyms = self.thesaurus.synonym_sampling()

[/usr/local/lib/python3.10/dist-packages/deeponto/align/bertmap/text_semantics.py](https://localhost:8080/#) in __init__(self, onto, annotation_property_iris, apply_transitivity)
     74         self.annotation_property_iris = iris
     75         total_number_of_annotations = sum([len(v) for v in self.annotation_index.values()])
---> 76         self.average_number_of_annotations_per_class = total_number_of_annotations / len(self.annotation_index)
     77 
     78         # synonym groups

ZeroDivisionError: division by zero

Following the stack trace I see that the code uses the length of self.annotation_index as denominator, but apparently this length is zero. This is a dictionary built by Ontology::build_annotation_index() based on annotation_property_iris, which as can be seen above is correctly populated and not empty. So I suspect the bug is located somewhere in this function, but I wasn't able to understand exactly where.

Desktop (please complete the following information):

ipynb notebook in Google Colab
Version 0.8.4

Small typo in the documentation.

While exploring the documentation I just came across a small typo in the example code snippets. There is a quotation mark missing in the a line of code there (see the link below). Nothing concerning but I just wanted to let you know. It may cause some unexpected error for those who copy-paste the code :)

onto.get_subsumption_axioms(entity_type="Classes) --> onto.get_subsumption_axioms(entity_type="Classes")

https://krr-oxford.github.io/DeepOnto/verbaliser/#:~:text=(entity_type%3D-,%22Classes),-%23%20verbalise%20the%20first

BERTMap Stuck at Mapping Extension

Describe the bug
The BERTMap model got stuck at the mapping extension phase.

To Reproduce
Steps to reproduce the behavior:
Run BERTMap on SNOMED-FMA (Body) task.

Generating results for EditSim

Hi, I am not able to generate the exact H@1 and MRR for EditSim for the FMA SNOMED task as reported in Table 4 in https://arxiv.org/pdf/2205.03447.pdf.

This is the command used:

python om_eval.py --saved_path './om_results' --pred_path './onto_match_experiment2/edit_sim/global_match/src2tgt' --ref_anchor_path 'data/equiv_match/refs/snomed2fma.body/unsupervised/src2tgt.rank/for_eval' --hits_at 1

These are the generated numbers: H@1: .841 and MRR: .89
Reported nos. in the paper: H@1: 869 and MRR: .895

I am not sure why the numbers are not consistent.
Is there anything that needs to be modified in the code to get the reported numbers?

Verbaliser throws KeyError

Bug Description
I'm trying to verbalise a class expression. The code I'm executing is as follows:

from deeponto.onto import Ontology, OntologyVerbaliser, OntologySyntaxParser

onto = Ontology("ontology.owl")
verbaliser = OntologyVerbaliser(onto)
complex_concepts = list(onto.get_asserted_complex_classes())

v_concept = verbaliser.verbalise_class_expression(complex_concepts[0])

Where ontology.owl is a simple ontology of RDF/XML syntax that contains an atomic concept, a datatype property and a complex concept. The whole ontology provided in Additional Context

I get the following error:

Traceback (most recent call last):
  File "/home/pg-xai2/sampling/examples/prova_deeponto.py", line 42, in <module>
    v_concept = verbaliser.verbalise_class_expression(complex_concepts[0])
  File "/home/pg-xai2/.conda/envs/ontolearn/lib/python3.9/site-packages/deeponto/onto/verbalisation.py", line 227, in verbalise_class_expression
    return self._verbalise_junction(parsed_class_expression)
  File "/home/pg-xai2/.conda/envs/ontolearn/lib/python3.9/site-packages/deeponto/onto/verbalisation.py", line 334, in _verbalise_junction
    other_children.append(self.verbalise_class_expression(child))
  File "/home/pg-xai2/.conda/envs/ontolearn/lib/python3.9/site-packages/deeponto/onto/verbalisation.py", line 214, in verbalise_class_expression
    return self._verbalise_iri(parsed_class_expression)
  File "/home/pg-xai2/.conda/envs/ontolearn/lib/python3.9/site-packages/deeponto/onto/verbalisation.py", line 254, in _verbalise_iri
    verbal = self.vocab[iri] if not self.keep_iri else iri_node.text
KeyError: 'http://dl-learner.org/mutagenesis#Compound'

This is the printed complex concept (maybe you can just try to manually construct this concept and test it out):

ObjectIntersectionOf(<http://dl-learner.org/mutagenesis#Compound> DataSomeValuesFrom(<http://dl-learner.org/mutagenesis#act> DatatypeRestriction(xsd:decimal facetRestriction(minInclusive "0.04"^^xsd:decimal))))

To Reproduce

Execute the code described above using the given ontology.

Additional context

OS: Linux

content of ontology.owl:

<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
         xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
         xmlns:owl="http://www.w3.org/2002/07/owl#"
         xml:base="http://dl-learner.org/mutagenesis"
         xmlns="http://dl-learner.org/mutagenesis#">

<owl:Ontology rdf:about="http://dl-learner.org/mutagenesis"/>

<owl:DatatypeProperty rdf:about="#act">
  <rdfs:domain rdf:resource="#Compound"/>
  <rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#double"/>
</owl:DatatypeProperty>

<owl:Class rdf:about="#Compound"/>

<owl:Class rdf:about="http://dl-learner.org/Pred_1">
  <rdfs:subClassOf rdf:resource="http://www.w3.org/2002/07/owl#Thing"/>
  <owl:equivalentClass>
    <owl:Class>
      <owl:intersectionOf rdf:parseType="Collection">
        <rdf:Description rdf:about="#Compound"/>
        <owl:Restriction>
          <owl:onProperty rdf:resource="#act"/>
          <owl:someValuesFrom>
            <rdfs:Datatype>
              <owl:onDatatype rdf:resource="http://www.w3.org/2001/XMLSchema#decimal"/>
              <owl:withRestrictions>
                <rdf:Description>
                  <rdf:first>
                    <rdf:Description>
                      <xsd:minInclusive rdf:datatype="http://www.w3.org/2001/XMLSchema#decimal">0.04</xsd:minInclusive>
                    </rdf:Description>
                  </rdf:first>
                  <rdf:rest rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#nil"/>
                </rdf:Description>
              </owl:withRestrictions>
            </rdfs:Datatype>
          </owl:someValuesFrom>
        </owl:Restriction>
      </owl:intersectionOf>
    </owl:Class>
  </owl:equivalentClass>
</owl:Class>


</rdf:RDF>

I tried other ontologies as well including Carcinogenesis and the whole Mutagenesis which you can find here. Since they do not contain complex concepts I tried to verbalize a sub class axioms like following:

# get subsumption axioms from the ontology
subsumption_axioms = onto.get_subsumption_axioms(entity_type="Classes")

# verbalise the first subsumption axiom
v_sub, v_super = verbaliser.verbalise_class_subsumption_axiom(subsumption_axioms[0])

The same kind of error as mentioned earlier occurred.

REST API with Dockerfile

In addition to a library, consider also creating a Dockerfile which uses FastAPI to serve web APIs that can be used. For instance, instead of having to import the library, I can deploy a docker container and call the APIs for which I will provide all the necessary inputs.