bio-tools / biotoolsschema Goto Github PK
View Code? Open in Web Editor NEWbiotoolsSchema : Tool description data model for computational tools in life sciences
License: Creative Commons Attribution Share Alike 4.0 International
biotoolsSchema : Tool description data model for computational tools in life sciences
License: Creative Commons Attribution Share Alike 4.0 International
I could see that most of the bioconductor packages in bio.tools are annotated with
resourceType: Tool
interfaceType: Command line
Some other R packages however have:
resourceType: Library
interfaceType: API
I understand that these terms should be rather broad and only a few of them in the enumeration, however I think it would make sense to add something more specific here. What is an R package really? I could see the definition of "Command line : Text-based interface to a tool or service", which is of course also true for R, but many researchers that use R, do so in a semi-graphical interface and some of them are scared off by 'actual' command lines. I think being able to explicitly search for R packages will help those having a better user experience.
... and include other core information, as this file is displayed here:
https://bio.tools/schema
As discussed with @ekry
Include clear & concise comments on biotoolsXSD contstrains ("this field can only contain ...")
Requested by ELIXIR EXCELERATE WP 7 - new attribute to capture an API is WP7-compliant
Of course, something generic is needed.
Including:
xs:appinfo
is a standard mechanism for defining business logic beyond the expressive power of the XSD language.
It avoids the need for hard-coding such logic into an application that uses the given XSD-based data format.
. . .
<xs:schema ... xmlns:biotoolsai="http://biotoolsregistry.org/appinfo" ... xmlns:xs="http://www.w3.org/2001/XMLSchema" ... >
. . .
<xs:element name="license" minOccurs="0">
<xs:annotation>
<xs:documentation>Software or data usage license</xs:documentation>
<xs:appinfo>
<biotoolsai:usage recommended="true"/>
<biotoolsai:longDescription>
#Blah blah
`biotools:license` is blah blaaaah
## GRRRRRRR
**WOOBAR**, isn't it?
</biotoolsai:longDescription>
<altova:exampleValues>
<altova:example value="GNU General Public License v3"/>
</altova:exampleValues>
</xs:appinfo>
</xs:annotation>
. . .
Thread from Andreas...
"> Forgive the very naive question, but do you maintain a list of links to packages (source, binary) currently available in
the Debian distro ?
I want to support linking to Debian packages from named tools in bio.tools.
May be either packages.debian.org or tracker.debian.org is what you are
seeking for depending from the amount of information you want to
present. For instance
https://packages.debian.org/bwa&exact=1
https://tracker.debian.org/bwa
If not a link, I guess I could just support package names; in this case, is there a valid syntax for package names (so I
can constrain this in our schema) ?
While there are syntactical constraints (lower case letters, numbers,
'-', '.', '+'; no upper case letters, no '_') you probably want to link
to existing packages which per definition will have a valid name. Or am
I missing something?"
Such a term would include e.g. commercial licenses
Yet another example where multiple concepts are needed for 1 output is Meta-pipe, generating annotation of (meta)genome assembly (contigs) with found protein-coding genes, protein domains, and information about those, such as taxa, DB hits scores, etc.
The 1-only chosen type of data "Protein features" is very far from this in its generalisation, isn't it?
Can you reassign the rights of "Orphanet portal for rare diseases and orphan drugs" and "Orphanet Rare Disease Ontology" to the user "Inserm US14" (the common user of Orphanet)
Thank you
And perhaps also interfaces (of tools) and format options (of an input or output)
Other than numerous previous discussions in various groups, this issue is also supported by a request from Alfred PΓΌhler, the coordinator of de.NBI, from 30th June 2016. (See the next comment where the content of the request is pasted.)
It also relates to some of the changes towards version 2.0, sketched at the TWW Hackathon in May 2016 in Paris: https://docs.google.com/spreadsheets/d/1_KGr2DkulwtAjFJzNjTm08zXVphFlVZ8p29Id6XFlxc (sheets to the right of the first sheet).
My notes and suggestions to the de.NBI request, and our previous discussions, are the following:
We should include also a good possibility to identify 'Collections' within Bio.Tools. That would mean allowing at least 2 attributes for each collection: a display name, and an "ID name". These two could be the same, but could also be e.g. "de.NBI" and "denbi", respectively. That should then allow dereferencing a collection at e.g. https://bio.tools/denbi instead of just an unspecific full-text search of https://bio.tools/?q=de.NBI.
In addition, we should consider other optional attributes of collections, such as description(s), super-collections (collectionA isIncludedIn collectionB), institutions, funding, credits, etc.
(I'm not sure about the collectionA isNewVersionOf collection, though. Although in a very special case it may make sense, e.g. if Bioconductor would change its name to BioCRAN π)
For ideas on how, see:
bio-tools/biotoolsRegistry#104
The error below should not happen:
Element '{http://bio.tools}publicationsPrimaryID': [facet 'pattern'] The value '10.12688/f1000research.6924.1' is not accepted by the pattern '(doi:)?[0-9]{2}\.[0-9]{4}/.*'
The DOI mentioned above is legit, the second part of the DOI (registrant code) is not necessarily only 4 digits, see:
http://www.doi.org/doi_handbook/2_Numbering.html#2.2.2
for msutils.org
For command line overall, and options
Use markdown,following e.g.
https://technet.microsoft.com/en-us/library/cc771080(v=ws.11).aspx
The GO tools collection :
https://docs.google.com/spreadsheets/d/1-Gu6EhBXTDr35vOU2MwUVOzGAWXTAkIdARdad9UShRw/edit#gid=1268003428
These are officially "GO approved" tools and we need to annotate them as such. This is an attribute of the "collection" per se, and not the tools, thus a new attribute is needed.
Consider closely what other collection-specific attributes are required.
Summary
Given a version number MAJOR.MINOR.PATCH, increment the:
MAJOR version when you make incompatible API changes,
MINOR version when you add functionality in a backwards-compatible manner, and
PATCH version when you make backwards-compatible bug fixes.
Additional labels for pre-release and build metadata are available as extensions to the MAJOR.MINOR.PATCH format.
http://semver.org/spec/v2.0.0.html
Thanks!
As a quick fix.
(In addition will work awesomely after adding edamontology/edamontology#40)
For the further future, consider registering collections as separate "resource types", with various properties (now only name) available for filling in.
Distinction between Developers and Contributors is very vague -- what is it a developer?
I suggest either renaming Developers to Main authors, or adding even slightly more granularity via generic Persons with Roles (then should perhaps merge with Contacts and their Roles)
Including URL template (https://en.wikipedia.org/wiki/URL_Template) and endpoints
Can we infer the image or container format from the file that is linked to?
Also speak to Christophe Blanchet re what is the useful information to expose about images/containers - is format enough, or is more needed?
there is an urlftpType that restricts anyURL to the either http(s) or (s)ftp (???).
on the other hand urlType is restricted to the http(s).
something is wrong here...
remove http(s) from urlftpType and rename urlType into urlhttpType. (???)
(moved from edamontology/edamontology#38)
I would find it really cool if it would be possible to learn about the contributors if their names would be linked to their web pages. Also, could be nice having their names linked to affiliations.
Special character '@' in the project name makes it unreachable
The displayed name of the project shouldn't be used as an id and in the project url.
There should have an displayed name for the project and an id name to set upon creation.
It'll allow integrating valuable descriptions such as https://github.com/common-workflow-language/workflows/blob/master/tools/GATK-PrintReads.cwl with Bio.Tools & co
In beta1:
line 91:
<xs:element name="biotoolsId" type="biotoolsIdType">
xs:annotation
<xs:documentation source="The ID is a URL in the bio.tools namespace and reflects (normally exactly) the tool name and version: see http://biotools.readthedocs.io/.">Unique ID that is assigned upon registration of the software in bio.tools. /xs:documentation
line 112:
<xs:element name="shortDescription">
xs:annotation
<xs:documentation source="A single declarative sentence in the present tense, providing a terse statement of the tool function. State what is done, i.e.operation, and primary inputs and outputs, but not how. Do not include tool name. See http://biotools.readthedocs.io/.">Short and concise textual description of the software function./xs:documentation
Should be acted upon asap (before 2.0), not to repeat the mistake of BioXSD, not to get stuck with XSD in the name forever.
E.g. although XSD 1.1 is better and more expressive than 1.0, JSON Schema may be even more expressive. And even better schema languages may appear whenever, without warning ;-)
Is it supposed to contain any type should put type="xs:anyType".
The same for the commandLineSpec.option.syntax & credit.name
It is not a language by itself (C++ is) but it can be very helpful that a program was not implemented in pure C++
It would be helpful for integration into linked data efforts such as Common Workflow Language if there were a Schema Salad (https://github.com/common-workflow-language/schema_salad) version of the biotools schema; this would provide schema support for expressing the biotools data model in YAML, JSON-LD and RDF.
Hooray, Artaza et al. 2016 (10.12688/f1000research.9206.1) voted software "discoverability", including Bio.Tools registration, as the 2nd most important factor of good practices π π π
We should make sure that other mentioned factors and requirements ibid. - other than just bare registration - are well and soon supported too, where applicable.
as for Common Development and Distribution License
Bake this into the comment?
This has often been requested
We want a link to the image (or a repo) with light-weight metadata, i..e not everything listed here:
http://docs.openstack.org/image-guide/image-metadata.html
Maybe just the disk and container format?
There are two local "Output" elements that looks the same.
Are they conceptually the same or two different classes should be used for the implementation?
Input and Output elements are defined as a restriction of dataType
dataType type already has "data" and "format" elements defined.
What is the reason to duplicate them in Input/Output (with the same definitions).
& render using content negotiation trick with dx.doi.org to grab Bibtex
πΈ
e.g. CWL file, Galaxy tool confiig file etc.
For list see:
https://en.wikipedia.org/wiki/Bioinformatics_workflow_management_system
Also see:
http://bib.oxfordjournals.org/content/early/2016/03/23/bib.bbw020.full
In the 'resource types' list, there is a type 'container' under which both docker images and VMs are categorized.
This is conceptually 'wrong'. Linux containerization is a totally different concept from VM. While each VM has its own OS, containers use the underlying kernel of the host OS. For containers, the underlying kernel must be Linux.
Also categorizing VMs under 'container' is confusing and misleading.
Also in dockers, the resource is called 'image' not 'container'. It is a container when it runs, but the resource is an image.
What I suggest is having two categories instead of one:
... somehow
this requested by various groups and people
(for all the same good reasons as supporting ORCIDs)
The best we have may be Ringgold IDs from openidentify.com but they seem to have no API or access to data. ORCID are using / working with them though, see:
http://support.orcid.org/knowledgebase/articles/276884-how-are-organizations-identified-in-orcid
Get possibility to insert a Docker registry url for the tool, example:
docker-registry.genouest.org/bioinfo/blast (meaning version latest)
or with a version tag
docker-registry.genouest.org/bioinfo/blast:1.0
with this, user only needs to do a
docker pull *docker_url*
Schema could support multiple Docker registries urls
Is it possible to register services, e.g. "conversion and upload service" or "biostatistics consultation service" in bio.tools?
required by msutils.org import
Template URL for binary packages:
packages.debian.org/binaryPackageName
Template URL to source packages is:
packages.debian.org/source/sourcePackageName
Although there is no URL type, IMO it's better to put anyURI than nothing.
(Contact.url & Credit.url & Image.url)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.