bio-tools / biotoolsschema Goto Github PK

View Code? Open in Web Editor NEW

36.0 36.0 12.0 9.27 MB

biotoolsSchema : Tool description data model for computational tools in life sciences

License: Creative Commons Attribution Share Alike 4.0 International

HTML 99.62% Java 0.38%

biotoolsschema's People

Contributors

Stargazers

Watchers

Forkers

dbolser unioslo bio-tools-community dfornika ksebby vonrosenchild leiwbit prasadbasutkar kigaard open-models-platform

biotoolsschema's Issues

Add "R" as an interfaceType

I could see that most of the bioconductor packages in bio.tools are annotated with
resourceType: Tool
interfaceType: Command line

Some other R packages however have:
resourceType: Library
interfaceType: API

I understand that these terms should be rather broad and only a few of them in the enumeration, however I think it would make sense to add something more specific here. What is an R package really? I could see the definition of "Command line : Text-based interface to a tool or service", which is of course also true for R, but many researchers that use R, do so in a semi-graphical interface and some of them are scared off by 'actual' command lines. I think being able to explicitly search for R packages will help those having a better user experience.

Update README.html to include link to GitHub

... and include other core information, as this file is displayed here:
https://bio.tools/schema

Hyperlinked JSON example pointing to XSD schema docs

As discussed with @ekry

Better documentation of element value contraints

Include clear & concise comments on biotoolsXSD contstrains ("this field can only contain ...")

Attribute for API-compliance

Requested by ELIXIR EXCELERATE WP 7 - new attribute to capture an API is WP7-compliant

Of course, something generic is needed.

Multiple fixes to 2.0 alpha-01 + docs

Pattern for < name > element: check it does not allow spaces.
Pattern for < description > element: what characters are not allowed? Maybe change the basic type from xs:string if appropriate, to make this more restrictive.
Change the pattern for < id > attribute and id> element (once settled in bio.tools URL scheme)
Settle the enum for resourceType
Pattern for element (once settled in bio.tools URL scheme)
ORCID simple type: specify type, pattern and sample (what is the valid ORCID syntax ?)
Is nesting 'choice' within 'sequence' (in contactDetails and creditDetails) is really necessary? Can we just use 'choice' ?
Add 'sample' value to PMID and PMCID
Document (on GitHub WIKI?) "roles" used by < credit > element: Developer, Maintainer, Other.
Document (on GitHub WIKI?) "roles" used by < contact > elements: General, Developer, Technical, Scientific, Maintainer, Helpdesk.
Document (on GitHub WIKI?) meaning of < relationshipType > enum
Better pattern for < description > element e.g. that sentence begins in upper case and ends with full stop, only ASCII characters etc.

Check compatibility with relevant schemes and standards

Including:

SPDX (The Software Package Data Exchange) specification is a standard format for communicating the components, licenses and copyrights associated with a software package
http://spdx.org/
HCLS (Dataset Descriptions: from Semantic Web in Health Care and Life Sciences Interest Group)
http://www.w3.org/2001/sw/hcls/notes/hcls-dataset/#s4_1
W3CPROV (provenance information)
https://www.w3.org/TR/prov-overview/
OntoSoft
http://ontosoft.org/ontology/software/
RFC standard for describing URL templates (URL scheme)
https://tools.ietf.org/html/rfc6570
(see also https://support.google.com/webmasters/answer/6080550?hl=en)
schema.org
http://www.schema.org/docs/extension.html
biodbcore
http://database.oxfordjournals.org/content/2011/baq027.full
GA4GH tool registry schema
https://github.com/ga4gh/tool-registry-schemas/
re3data.org Schema for the Description of Research Data Repositories
https://blog.datacite.org/new-re3data-schema-and-search-functionality/

"Machine-understandable" but application-specific annotation inside XSD

xs:appinfo is a standard mechanism for defining business logic beyond the expressive power of the XSD language.

It avoids the need for hard-coding such logic into an application that uses the given XSD-based data format.

Example

. . .
<xs:schema ... xmlns:biotoolsai="http://biotoolsregistry.org/appinfo" ... xmlns:xs="http://www.w3.org/2001/XMLSchema" ... >
. . .
    <xs:element name="license" minOccurs="0"> 
        <xs:annotation> 
            <xs:documentation>Software or data usage license</xs:documentation> 
            <xs:appinfo> 
                <biotoolsai:usage recommended="true"/> 
                <biotoolsai:longDescription>
                    #Blah blah

                    `biotools:license` is blah blaaaah

                    ## GRRRRRRR

                    **WOOBAR**, isn't it?
                </biotoolsai:longDescription>
                <altova:exampleValues> 
                    <altova:example value="GNU General Public License v3"/> 
                </altova:exampleValues> 
            </xs:appinfo> 
        </xs:annotation> 
. . .

Provide regex restricting syntax of links to Debian packages

Thread from Andreas...

"> Forgive the very naive question, but do you maintain a list of links to packages (source, binary) currently available in

the Debian distro ?

I want to support linking to Debian packages from named tools in bio.tools.

May be either packages.debian.org or tracker.debian.org is what you are
seeking for depending from the amount of information you want to
present. For instance

https://packages.debian.org/bwa&exact=1
https://tracker.debian.org/bwa

If not a link, I guess I could just support package names; in this case, is there a valid syntax for package names (so I
can constrain this in our schema) ?

While there are syntactical constraints (lower case letters, numbers,
'-', '.', '+'; no upper case letters, no '_') you probably want to link
to existing packages which per definition will have a valid name. Or am
I missing something?"

Add Proprietary license to licenses

Such a term would include e.g. commercial licenses

Multiple EDAM concepts needed for a single output + operation|data|format HANDLES

Yet another example where multiple concepts are needed for 1 output is Meta-pipe, generating annotation of (meta)genome assembly (contigs) with found protein-coding genes, protein domains, and information about those, such as taxa, DB hits scores, etc.
The 1-only chosen type of data "Protein features" is very far from this in its generalisation, isn't it?

Reassignation of tools

Can you reassign the rights of "Orphanet portal for rare diseases and orphan drugs" and "Orphanet Rare Disease Ontology" to the user "Inserm US14" (the common user of Orphanet)

Thank you

We need persistent unique IDs also for individual operations and input and output parameters

And perhaps also interfaces (of tools) and format options (of an input or output)

Bio.Tools Collections: IDs and other attributes ...

Other than numerous previous discussions in various groups, this issue is also supported by a request from Alfred Pühler, the coordinator of de.NBI, from 30th June 2016. (See the next comment where the content of the request is pasted.)

It also relates to some of the changes towards version 2.0, sketched at the TWW Hackathon in May 2016 in Paris: https://docs.google.com/spreadsheets/d/1_KGr2DkulwtAjFJzNjTm08zXVphFlVZ8p29Id6XFlxc (sheets to the right of the first sheet).

My notes and suggestions to the de.NBI request, and our previous discussions, are the following:

We should include also a good possibility to identify 'Collections' within Bio.Tools. That would mean allowing at least 2 attributes for each collection: a display name, and an "ID name". These two could be the same, but could also be e.g. "de.NBI" and "denbi", respectively. That should then allow dereferencing a collection at e.g. https://bio.tools/denbi instead of just an unspecific full-text search of https://bio.tools/?q=de.NBI.

In addition, we should consider other optional attributes of collections, such as description(s), super-collections (collectionA isIncludedIn collectionB), institutions, funding, credits, etc.
(I'm not sure about the collectionA isNewVersionOf collection, though. Although in a very special case it may make sense, e.g. if Bioconductor would change its name to BioCRAN 😆)

"Creative Commons Attribution NonCommerical NoDerivs" misspelled "Commercial"

Support NCBI taxon ID

For ideas on how, see:
bio-tools/biotoolsRegistry#104

doi syntax regex is too restrictive

The error below should not happen:

Element '{http://bio.tools}publicationsPrimaryID': [facet 'pattern'] The value '10.12688/f1000research.6924.1' is not accepted by the pattern '(doi:)?[0-9]{2}\.[0-9]{4}/.*'

The DOI mentioned above is legit, the second part of the DOI (registrant code) is not necessarily only 4 digits, see:
http://www.doi.org/doi_handbook/2_Numbering.html#2.2.2

Add "AWK" to language

for msutils.org

Support basic command-line syntax / summary

For command line overall, and options

Use markdown,following e.g.
https://technet.microsoft.com/en-us/library/cc771080(v=ws.11).aspx

Attributes specific for simple "collections" of other tools (from GO import)

The GO tools collection :
https://docs.google.com/spreadsheets/d/1-Gu6EhBXTDr35vOU2MwUVOzGAWXTAkIdARdad9UShRw/edit#gid=1268003428

These are officially "GO approved" tools and we need to annotate them as such. This is an attribute of the "collection" per se, and not the tools, thus a new attribute is needed.

Consider closely what other collection-specific attributes are required.

Version the standard in a semver.org manner

Summary

Given a version number MAJOR.MINOR.PATCH, increment the:

MAJOR version when you make incompatible API changes,
MINOR version when you add functionality in a backwards-compatible manner, and
PATCH version when you make backwards-compatible bug fixes.
Additional labels for pre-release and build metadata are available as extensions to the MAJOR.MINOR.PATCH format.

http://semver.org/spec/v2.0.0.html

Thanks!

Deprecate "Infrastructure", use "Collection"

As a quick fix.

(In addition will work awesomely after adding edamontology/edamontology#40)

For the further future, consider registering collections as separate "resource types", with various properties (now only name) available for filling in.

Rename Developer(s) to Main author(s) or so

Distinction between Developers and Contributors is very vague -- what is it a developer?

I suggest either renaming Developers to Main authors, or adding even slightly more granularity via generic Persons with Roles (then should perhaps merge with Contacts and their Roles)

Support basic HTTP API spec

Including URL template (https://en.wikipedia.org/wiki/URL_Template) and endpoints

image metadata

Can we infer the image or container format from the file that is linked to?

Also speak to Christophe Blanchet re what is the useful information to expose about images/containers - is format enough, or is more needed?

urlftpType could hold http(s) ?

there is an urlftpType that restricts anyURL to the either http(s) or (s)ftp (???).
on the other hand urlType is restricted to the http(s).
something is wrong here...
remove http(s) from urlftpType and rename urlType into urlhttpType. (???)

Linking contributor names to their web pages, institutions, ORCIDs

(moved from edamontology/edamontology#38)
I would find it really cool if it would be possible to learn about the contributors if their names would be linked to their web pages. Also, could be nice having their names linked to affiliations.

Special character '@' in the project name makes it unreachable

The displayed name of the project shouldn't be used as an id and in the project url.
There should have an displayed name for the project and an id name to set upon creation.

Add mappings to DOAP & other main vocabularies

It'll allow integrating valuable descriptions such as https://github.com/common-workflow-language/workflows/blob/master/tools/GATK-PrintReads.cwl with Bio.Tools & co

xs:documentation source attribute must contain an URI

In beta1:

line 91:
<xs:element name="biotoolsId" type="biotoolsIdType">
xs:annotation
<xs:documentation source="The ID is a URL in the bio.tools namespace and reflects (normally exactly) the tool name and version: see http://biotools.readthedocs.io/.">Unique ID that is assigned upon registration of the software in bio.tools. /xs:documentation

line 112:
<xs:element name="shortDescription">
xs:annotation
<xs:documentation source="A single declarative sentence in the present tense, providing a terse statement of the tool function. State what is done, i.e.operation, and primary inputs and outputs, but not how. Do not include tool name. See http://biotools.readthedocs.io/.">Short and concise textual description of the software function./xs:documentation

Rename to "bio.tools.Schema" or so, to avoid tech choice lock

Should be acted upon asap (before 2.0), not to repeat the mistake of BioXSD, not to get stuck with XSD in the name forever.

E.g. although XSD 1.1 is better and more expressive than 1.0, JSON Schema may be even more expressive. And even better schema languages may appear whenever, without warning ;-)

commandLineSpec.syntax element has no any type specified

Is it supposed to contain any type should put type="xs:anyType".
The same for the commandLineSpec.option.syntax & credit.name

Add Visual C++ as language?

It is not a language by itself (C++ is) but it can be very helpful that a program was not implemented in pure C++

SALAD version of biotools schema

It would be helpful for integration into linked data efforts such as Common Workflow Language if there were a Schema Salad (https://github.com/common-workflow-language/schema_salad) version of the biotools schema; this would provide schema support for expressing the biotools data model in YAML, JSON-LD and RDF.

Improve functionality mentioned in Artaza et al. 2016

Hooray, Artaza et al. 2016 (10.12688/f1000research.9206.1) voted software "discoverability", including Bio.Tools registration, as the 2nd most important factor of good practices 👍 👍 👍

We should make sure that other mentioned factors and requirements ibid. - other than just bare registration - are well and soon supported too, where applicable.

add CDDL to licenses

as for Common Development and Distribution License

Guideline for tool short description

Provide only a terse statement of the tool function: what is done not how
Use a single declarative sentence in the present tense
Do not include tool name

Bake this into the comment?

Support virtual machine images

This has often been requested

We want a link to the image (or a repo) with light-weight metadata, i..e not everything listed here:
http://docs.openstack.org/image-guide/image-metadata.html

Maybe just the disk and container format?

Q: Endpoint.Output vs Function.Output

There are two local "Output" elements that looks the same.
Are they conceptually the same or two different classes should be used for the implementation?

Input/Output duplicate attributes from dataType

Input and Output elements are defined as a restriction of dataType

dataType type already has "data" and "format" elements defined.
What is the reason to duplicate them in Input/Output (with the same definitions).

Add non-article DOI field for everything & offer bibtex for export

& render using content negotiation trick with dx.doi.org to grab Bibtex

😸

Support (links to) tool wrappers

e.g. CWL file, Galaxy tool confiig file etc.

For list see:
https://en.wikipedia.org/wiki/Bioinformatics_workflow_management_system

Also see:
http://bib.oxfordjournals.org/content/early/2016/03/23/bib.bbw020.full

Resource types: Docker images vs VMs

In the 'resource types' list, there is a type 'container' under which both docker images and VMs are categorized.
This is conceptually 'wrong'. Linux containerization is a totally different concept from VM. While each VM has its own OS, containers use the underlying kernel of the host OS. For containers, the underlying kernel must be Linux.

Also categorizing VMs under 'container' is confusing and misleading.

Also in dockers, the resource is called 'image' not 'container'. It is a container when it runs, but the resource is an image.

What I suggest is having two categories instead of one:

Virtual machine
Docker image

Make distinct those tools which are virtualised

... somehow

this requested by various groups and people

Support for organisational IDs

(for all the same good reasons as supporting ORCIDs)

The best we have may be Ringgold IDs from openidentify.com but they seem to have no API or access to data. ORCID are using / working with them though, see:
http://support.orcid.org/knowledgebase/articles/276884-how-are-organizations-identified-in-orcid

Add a Docker registry link

Get possibility to insert a Docker registry url for the tool, example:

docker-registry.genouest.org/bioinfo/blast (meaning version latest)

or with a version tag

docker-registry.genouest.org/bioinfo/blast:1.0

with this, user only needs to do a

docker pull *docker_url*

Schema could support multiple Docker registries urls