Giter Club home page Giter Club logo

mlsea-kgc's Introduction

MLSea Resource Code

License DOI



This repository contains source code and RML mappings used for creating MLSea-KG, a declaratively constructed and regularly updated machine learning KG with more than 1.44 billion RDF triples containing metadata about machine learning:

  • Datasets
  • Tasks
  • Implementations and related hyper-parameters
  • Experiment executions, their configuration settings and evaluation results
  • Code notebooks and repositories
  • Algorithms
  • Publications
  • Models
  • Scientists and practitioners

The data were gathered and integrated from OpenML, Kaggle and Papers with Code.


MLSea-KG Construction Process Overview

Error loading the image!


Data Integration

Resource code directory contains resource code used for collecting, pre-processing, sampling and declaratively generating RDF triples, using the declarative mappings included. The input data sources used are the OpenML data extracted from the OpenML API, the Meta Kaggle CSVs and the Papers with Code dumps, which are not included in this repository. OpenML CSV dumps are also generated, to store data retrieved from the OpenML API.

RML Mappings

The RML mapping that were used for each platform are also provided, demonstrating the rules used to declaratively construct MLSea-KG. Both common RML mappings and the corresponding in-memory RML mappings used to generate RDF from in-memory samples are provided, complemented by their YARRRML serialization.


Querying MLSea-KG

MLSea-KG is accessible through our SPARQL endpoint. The sparql_examples folder contains example queries for traversing MLSea-KG.


MLSea-KG Snapshots

MLSea-KG snapshots are available at MLSea-KG's Zenodo repository.


Resource Code Pagkage Installation

Clone the repository:

git clone https://github.com/dtai-kg/MLSea-KGC.git

Install dependencies:

pip install requirements.txt

Resource Code Pagkage Usage


Import Original Data Sources

  • Edit 'config.py' to set the target locations where imported data sources will be stored.

  • Download Kaggle metadata through the Meta Kaggle dataset.

  • Download Papers with Code metadata through the Papers with Code dump files.

  • Download OpenML metatadata and store them as CSV backups through the OpenML API service with 'openml_data_collector.py':

      python openml_data_collector.py
    

Process RDF Mappings

View and explore the RDF mappings. Make necessary changes to the input sources paths to point to the location of your local data sources.


Generate RDF dumps

Generate the RDF dumps of MLSea-KG by running:

python data_integration_openml.py
python data_integration_kaggle.py
python data_integration_pwc.py

Cite

Thank you for reading! To cite our resource:

@InProceedings{dasoulas2024mlsea,
    author    = {Dasoulas, Ioannis and Yang, Duo and Dimou, Anastasia},
    booktitle = {The Semantic Web},
    title     = {{MLSea: A Semantic Layer for Discoverable Machine Learning}},
    year      = {2024}
}

mlsea-kgc's People

Contributors

therazorace avatar

Stargazers

David Lamprecht avatar Yetmens avatar  avatar Stive Hobbys avatar

Watchers

Anastasia Dimou avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.