The primary functions of seqcol are to 1) define unique identifiers for sequencing col

From today's discussion: server-scoped metadata, like the sche

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Solved with "no metadata endpoint" decision in <a class="issue-link js-issue-link" dat

How should we specify the metadata endpoint? about seqcol-spec HOT 8 CLOSED

ga4gh commented on August 28, 2024

How should we specify the metadata endpoint?

from seqcol-spec.

Comments (8)

nsheff commented on August 28, 2024 1

From today's discussion:

server-scoped metadata, like the schema we described above, should be served by /service-info (#39)
collection-scoped or sequence-scoped metadata don't fit under /service-info. For these, they could maybe go into a /metadata endpoint? But I think we should wait for discussion on #40, since the "undigested attributes" could correspond to metadata, so how to serve those will come out in that discussion.

from seqcol-spec.

andrewyatz commented on August 28, 2024

@jb-adams you have had some experience with service-info I think? Also one possible alternative to the metadata endpoint is have the JSON send back the specific schema is uses

{
  "$id": "http://yourdomain.com/schemas/myschema.json",
  "$schema": "http://json-schema.org/schema#"
}

Both bits here shamelessly stolen from JSON Schema's basics page.

from seqcol-spec.

jb-adams commented on August 28, 2024

@andrewyatz @nsheff yes I'm quite familiar with /service-info. What are you thinking of in particular for this issue? At the outset, I think we can keep these design considerations in mind:

Implement the /service-info endpoint in such a way that it won't collide with another /service-info endpoint if multiple GA4GH API specs are implemented by a single web service (e.g. a refget + seqcol service). e.g. this may be as simple as implementing the endpoint at /collections/service-info rather than /service-info
Extend the /service-info endpoint with custom attributes specific to seqcol. In htsget we added an htsget object to the base service info response, which contains info about supported formats, etc. For seqcol, we could use this to inform clients whether an optional metadata endpoint is implemented, and/or what schema(s) are supported by the service if we want to allow for multiple metadata schemas

from seqcol-spec.

andrewyatz commented on August 28, 2024

It's more about how to offer the same endpoint but with extensions. In service-info we just allowed individual specifications in OpenAPI to inherit our base schema and extend. But that means you have to do it in openAPI and there is no way to access the schema bar going into OpenAPI. Maybe one to consider what our best practice is here

from seqcol-spec.

jb-adams commented on August 28, 2024

Oh, so how to extend the base ServiceInfo schema in OpenAPI? We did this in htsget with the allOf construct, basically importing all base attributes and adding our own:

htsgetServiceInfo:
      allOf:
        - '$ref': '#/components/schemas/ServiceInfo'
        - type: object
          properties:
            htsget:
              type: object
              description: extended attributes for htsget
              properties:
                datatype:
                    type: string
                    description: >
                      Indicates the htsget datatype category ('reads' or 'variants')
                      served by the ticket endpoint related to this service-info
                      endpoint
                    enum: [reads, variants]
                    example: reads
                formats:
                  type: array
                  description: >
                    List of alignment or variant file formats supported
                    by the htsget endpoint. If absent, clients cannot make 
                    assumptions about what formats are supported ahead
                    of making a query.
                  items:
                    type: string
                    enum: [BAM, CRAM, VCF, BCF]
                fieldsParameterEffective:
                  type: boolean
                  description: >
                    Indicates whether the web service supports alignment field
                    inclusion/exclusion via the `fields` parameter. If absent,
                    clients cannot make assumptions about whether the `fields`
                    parameter is effective ahead of making a query.
                tagsParametersEffective:
                  type: boolean
                  description: >
                    Indicates whether the web service supports alignment tag
                    inclusion/exclusion via the `tags` and `notags` parameters.
                    If absent, clients cannot make assumptions about whether the
                    `tags` and `notags` parameters are effective ahead of making
                    a query.
        - type: object
          description: >
            This response extends the GA4GH Service Info specification
            with htsget-specific properties under the 'htsget' attribute.
            ServiceType 'artifact' property MUST be 'htsget' for both reads 
            and variants endpoints.
          required:
            - type
          properties:
            type:
              type: object
              required:
                - artifact
              properties:
                artifact:
                  type: string
                  enum: [htsget]
                  example: htsget

You'll see 3 objects under the allOf parameter. In order, they:

import the base Service schema from service info
add extended attributes under the htsget property
constrain the type.artifact value so that only htsget is allowable

Is this what you're referring to?

from seqcol-spec.

andrewyatz commented on August 28, 2024

Discussions from the seqcol meeting just now said we should go the same route as refget, which specified the schema only in OpenAPI format. Also that this issue will get split into two to address the issue of having this endpoint (and if it is mandatory) and if so what is the format of that response (assuming I understood the resolution correctly)

from seqcol-spec.

sveinugu commented on August 28, 2024

Hi, and thanks for including me in the seqcol meeting! I am a senior engineer employed by ELIXIR Norway (at the University of Oslo).

So the reason I was invited, was that I am one of the main developers of the FAIRtracks draft standard (and related tool infrastructure) for metadata of genomic tracks files, which is the result on an ELIXIR implementation study: http://fairtracks.github.io. So FAIRtracks is available in the form of a set of JSON schemas: https://github.com/fairtracks/fairtracks_standard/. It is for now a suggestion and is meant to evolve. So obviously the metadata aspect of seqcol is of interest to me, and adding seqcol support would be a natural extension. A manuscript is written and will be submitted soon.

So this seems to be a bit late in the process, so I hope I am not being too assuming here. I just wanted to present some initial thoughts:

As mentioned in the meeting, having a way for the metadata content to refer back to the schema would be useful for versioning purposes (and would be nice to be included in the first version, so that downstream implementations don't have to add a specific rule for the first version). In the FAIRtracks standard, we have added an '@Schema' field which contains an URL that includes a version string. Another useful feature of a '@Schema' field, as someone else mentioned, is to provide a simple way to validate the payload.
I think it would be an idea to ponder a bit on the FAIR principles (https://www.go-fair.org/fair-principles/). I think most of the points are already handled by the current specification or are not relevant, but there are at least some that pose a challenge:

"I2. (Meta)data use vocabularies that follow FAIR principles": As mentioned by @nsheff, it would be nice if all relevant fields, such as source, would point to an ontology or vocabulary.
"I3. (Meta)data include qualified references to other (meta)data". So this is my main idea here, is that it would be nice to have a pointer to a record describing the source content. In FAIRtracks, we make use of CURIEs identifiers resolvable by https://identifiers.org for this (and we will probably also support n2t.net at some point). So the seqcol is a new identifier that is meant to be used in place of such source-specific identifiers, but I think the metadata should contain the relation.
"R1. (Meta)data are richly described with a plurality of accurate and relevant attributes". This would not be natural for a minimal standard such as seqcol, but providing a resolvable source identifier (see I3) would make it relatively easy to access a larger set of relevant metadata fields. So the main approach of FAIRtracks in this context is to include the fields that are most useful, and refer to other records for the rest.

As to the question of whether metadata should be required or not, I am in the "required" camp, at least for the most important fields, which for me would be source identifier (as CURIE), organism identifier, and I think also version.

from seqcol-spec.

nsheff commented on August 28, 2024

Solved with "no metadata endpoint" decision in #54.

from seqcol-spec.

How should we specify the metadata endpoint? about seqcol-spec HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent