Giter Club home page Giter Club logo

codemeta's Introduction

CodeMeta

Join the chat at https://gitter.im/codemeta/codemeta Permanent Identifier

CodeMeta contributors are creating a minimal metadata schema for science software and code, in JSON and XML. The goal of CodeMeta is to create a concept vocabulary that can be used to standardize the exchange of software metadata across repositories and organizations. CodeMeta started by comparing the software metadata used across multiple repositories, which resulted in the CodeMeta Metadata Crosswalk. That crosswalk was then used to generate a set of software metadata concepts, which were arranged into a JSON-LD context for serialization.

See https://codemeta.github.io for a visualization of the crosswalk table and guides for users and developers.

CodeMeta Schema

The schema for released versions of CodeMeta are:

  • CodeMeta-3.0: https://w3id.org/codemeta/v3.0
    • Matthew B. Jones, Carl Boettiger, Abby Cabunoc Mayes, Arfon Smith, Morane Gruenpeter, Valentin Lorentz, Thomas Morrell, Daniel Garijo, Peter Slaughter, Kyle Niemeyer, Yolanda Gil, Martin Fenner, Krzysztof Nowak, Mark Hahnel, Luke Coy, Alice Allen, Mercè Crosas, Ashley Sands, Neil Chue Hong, Patricia Cruse, Daniel S. Katz, Carole Goble, Bryce Mecum, Alejandra Gonzalez-Beltran, Noam Ross. 2023. CodeMeta: an exchange schema for software metadata. Version 3.0. https://w3id.org/codemeta/v3.0
  • CodeMeta-2.0: https://doi.org/10.5063/schema/codemeta-2.0
    • Matthew B. Jones, Carl Boettiger, Abby Cabunoc Mayes, Arfon Smith, Peter Slaughter, Kyle Niemeyer, Yolanda Gil, Martin Fenner, Krzysztof Nowak, Mark Hahnel, Luke Coy, Alice Allen, Mercè Crosas, Ashley Sands, Neil Chue Hong, Patricia Cruse, Daniel S. Katz, Carole Goble. 2017. CodeMeta: an exchange schema for software metadata. Version 2.0. KNB Data Repository. doi:10.5063/schema/codemeta-2.0
  • CodeMeta-1.0: https://doi.org/10.5063/schema/codemeta-1.0
    • Matthew B. Jones, Carl Boettiger, Abby Cabunoc Mayes, Arfon Smith, Peter Slaughter, Kyle Niemeyer, Yolanda Gil, Martin Fenner, Krzysztof Nowak, Mark Hahnel, Luke Coy, Alice Allen, Mercè Crosas, Ashley Sands, Neil Chue Hong, Patricia Cruse, Daniel S. Katz, Carole Goble. 2016. CodeMeta: an exchange schema for software metadata. KNB Data Repository. doi:10.5063/schema/codemeta-1.0

Contributors

CodeMeta is a community project with many contributors spanning research, education, and engineering domains. - See our list of Contributors. You can cite the CodeMeta schema and project as:

Matthew B. Jones, Carl Boettiger, Abby Cabunoc Mayes, Arfon Smith, Morane Gruenpeter, Valentin Lorentz, Thomas Morrell, Daniel Garijo, Peter Slaughter, Kyle Niemeyer, Yolanda Gil, Martin Fenner, Krzysztof Nowak, Mark Hahnel, Luke Coy, Alice Allen, Mercè Crosas, Ashley Sands, Neil Chue Hong, Patricia Cruse, Daniel S. Katz, Carole Goble, Bryce Mecum, Alejandra Gonzalez-Beltran, Noam Ross. 2023. CodeMeta: an exchange schema for software metadata. Version 3.0. https://w3id.org/codemeta/v3.0

How you can help

Join us! We welcome help formalizing a schema and creating mappings between existing software metadata schemas and the proposed schema. And writing documentation. And evangelizing. And other stuff, however you might be able to contribute.

Project history

This is an extension of the work done by @arfon, @hubgit, @kaythaney and others on Code as a Research Object / fidgit. Code as a research object is a Mozilla Science Lab (@MozillaScience) project working with community members to explore how we can better integrate code and scientific software into the scholarly workflow. Out of this came fidgit - a proof of concept integration between GitHub and figshare, providing a Digital Object Identifier (DOI) for the code which allows for persistent reference linking.

With codemeta, we want to formalize the schema used to map between the different services (GitHub, figshare, Zenodo) to help others plug into existing systems. Having a standard software metadata interoperability schema will allow other data archivers and libraries join in. This will help keep science on the web shareable and interoperable!

Organizers

The CodeMeta project has a governance model and a governing body to oversee the development and maintenance of the CodeMeta vocabulary, crosswalk table, website, software and other related content.

Links

codemeta's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

codemeta's Issues

GitHub crosswalk additions/adjustments

Assuming that the GitHub fields are mapping to the repositories API I have some comments/suggestions for modification:

Concept Current Suggested Comment
SoftwareIdentifier id full_name More descriptive but potentially less stable?
RelatedLink --- homepage
ZippedCodeLink --- archive_url
ProgrammingLanguage --- languages_url

Add additional dependency fields to crosswalk

Software packages can have different level of requirements, for example, R has the
levels

  • Depends
  • Imports
  • Suggests

The crosswalk currently has 'Dependency', but fields for 'Imports' and 'Suggests'
should be added.

Any ideas on more generic names for 'Imports' and 'Suggests'?

Missing concepts in the crosswalk?

Apologies if this has been previously discussed but I couldn't find anything about it.

I have a few thoughts about potentially missing concepts in the crosswalk. Specifically, I didn't see entries for publisher, the programming language used, or entries that link to different versions of the same software. Also, it might be good to consider the concept of SoftwareContributor in addition to SoftwareAuthor

I think publisher and SoftwareContributor are important in terms of attribution. Publisher needs no explanation. SoftwareContributor could include those that didn't write any code but nevertheless made a contribution (e.g., project management, administration). Explicitly stating the programming language(s) has implications for discoverability and reuse (e.g., searching for a library which is written in Python only)

The idea of linking together different versions of the same software are relevant to the Software Entities Model . I did see that dependencies were discussed in the JSON-LD project here but I think this is a sligthly different concept.

Thoughts?

Nicolas P. Rougier: new volunteer via Mozilla Science Lab Collaborate

Hello,

I'm in the process (with @khinsen) of creating a peer-reviewed scientific journal that lives on github and which is dedicated to the replication of results in computational science. It is not yet officially open but you can get a better idea at https://github.com/ReScience/ReScience/wiki.
We're currently in the process of testing the review system to check if will allow us to ultimately publish an article with the associated DOI through Zenodo (ReScience/ReScience-submission#3).

I've just stumbled upon your project and while I'm not sure how I can help, I would be definitely interested in following your progress and integrate the proposed standardized metadata it in the journal.

Nicolas


This issue was created by @rougier via Mozilla Science Lab Collaborate

Add DataCite Schema to the crosswalk table?

One advantage the integrated Github -> Zenodo workflow seems to offer is that, in being assigned a DOI, the software is also then indexed, with metadata, in DataCite. Since we can query the DataCite API this seems like it would offer a plausible solution to a software discovery index (a la NIH's proposal for such an index, http://softwarediscoveryindex.org/).

So, perhaps the DataCite schema should also get a column on the crosswalk chart, to help track what metadata does and doesn't make it this far down the Github->Zenodo->DataCite pipeline (or any other pipeline that leads software to a datacite DOI).

Unfortunately, it seems the data that currently gets through this pipeline is pretty small (from a random example of software in zenodo: http://data.datacite.org/application/vnd.citationstyles.csl+json/10.5281/zenodo.15094).

Consider adding series identifier distinction

Software citation requires being able to cite both the specific version used and the more general concept of a software product. Within the data world the idea of a Persistent Identifier that points at a specific version differs from a Series Identifier that points at a whole version chain, and that resolves to the latest version. Consider implementing this distinction in CodeMeta identifier semantics.

Should we crosswalk to ADMS.SW?

The Asset Description Metadata Schema for Software "is a metadata vocabulary to describe software making it possible to more easily explore, find, and link software on the Web. The specification reuses existing specifications, such as DOAP, SPDX, ISO 19770-2, ADMS, and the Trove software map. By using ADMS.SW to describe software in software forges, repositories, and catalogues, publishers increase discoverability and enable applications easily to consume software metadata."

However I'm not sure if anyone actually started implementing it.

DateModified

Does this refer to the code itself, or to the metadata? What provisions are there for editing metadata? Say that I have to update my email - how can I do that?

This isn't clear to me at the moment, but I may be missing something.

Add additional fields to crosswalk

Add the following fields to the crosswalk concepts

  • softwarePaperCitation
    • the citation for the paper that describes the software
  • softwareCitation
    • the citation for the software itself
  • downloadCount
    • download count of the software from a repository
  • citationCount
    • count of citations for the software itself
  • funding
    • funding for the research and development of the software
  • relatedPublications
    • publications related to the software

Chicken/egg problem: need to include DOI in metadata, but want citation file in release to get DOI?

I'd like to start including a citation metadata file following the codemeta standard with my software, but I'm a bit confused about what appears to me to be a chicken/egg problem: the codemeta schema wants an identifier (e.g. DOI)—but in order to get a DOI for my (GitHub) repo, I need to create a release of the software.

So, I'm a bit confused about the appropriate process here... if I create a release to get the DOI, it would be missing the full metadata associated with that release.

Interest in this project

Hi all,

I'll throw my hat in the ring and say that I'm willing to contribute to this project, as I've found it useful in my investigations regarding research software preservation. I've made some updates to the crosswalk that I'm willing to share, though I can open separate issues/pull requests for those.

In the mean time I have a quick question about the crosswalk. Is there a reason that 'version' is not included in the Codemeta column even though it's there in the codemeta.jsonld, or is it just an omission?
-Fernando

Determine appropriate schema / type for all values in codemeta.jsonld

Several of the terms in codemeta.jsonld have types that are currently serving as 'placeholders' until
more appropriate types can be determined, as neither schema.org or Dublin Core has something
close (maybe we have types added to schema.org or use another schema?). The terms that could be upgraded are:

  • "accessList": "dcterms:accessRights"
  • "buildInstructions": "schemaorg:URL"
  • "citationCount": "schemaorg:integer"
  • "contIntegration": "schemaorg:URL"
  • "docsCoverage": "schemarg:integer"
  • "downloadCount": "schemarg:integer"
  • "embargoDate": "schemaor:date"
  • "isAomatedBuild": "schemaorg:Text"
  • "package": "schemaorg:Text"
  • "inputs": "schemaorg:Text"
  • "outputs": "schemaorg:Text"
  • "function": "schemaorg:Text"
  • "funding": "schemaorg:Text"
  • "objectType": "dcterms:type"
  • "programmingLanguage": "schemaorg:Text"
  • "readme": "schemaorg:URL"
  • "issueTracker": "schemaorg:URL"
  • "relatedLink": "schemaorg:URL"
  • "relatedPublications": "schemaorg:URL"
  • "relationship": "schemaorg:Text"
  • "testCoverage": "schemaorg:integer"
  • "zippedCode": "schemaorg:URL"

Create JSON-LD context for codemeta

JSON-LD allows one to create a @context that is published in an external file and can be referenced by JSON files that want to import the JSON-LD context definition. This works like a schema definition for the vocabulary. Create one for codemeta.

for use case table, add 'Additional metadata' column

Many of the use case requirements listed in the use case table are necessary but not sufficient to handle the use case. Adding a column that indicates that 'Additional software metadata' are required for the use case would be useful.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.