Giter Club home page Giter Club logo

Comments (5)

jkbonfield avatar jkbonfield commented on June 29, 2024 1

It's also worth noting even if this was being generated by MGI and we hadn't previously decided on DNBSEQ, it would be rejected. The problem with MGIG400 is it conflates platform with model, which are two different tags.

from hts-specs.

jkbonfield avatar jkbonfield commented on June 29, 2024 1

We did also discuss this sort of issue in the most recent conference call.

The issue was one of validation. Having an invalid field here does not invalidate the rest of the file. Syntacically it would all make sense. The point was raised what if we check these fields and reject files that don't match, but the specification then gets updated? We haven't updated the SAM version number when we've added extra fields here as the syntax is identical, so programs cannot check that either. That's a valid point. We felt the correct process would be, if validation is performed, to make it a warning only. This fits with the point above that unknown data here does not invalidate any remainder of the file.

You could argue then what's the point of having a controlled vocabulary, as it doesn't stop people from just adding anything there anyway (as demonstrated). We feel there is still merit in having PL as a controlled vocabulary, as it gives vendors and users alike a clue as to what is expected. Without it we're highly likely to get ONT, OxfordNanopore, OxfordNanoporeTechnology, Oxford_Nanopore_Technology, etc. That's even ignoring the issue of case sensitivity. With it, well we may still get invalid fields, but hopefully it is significantly reduced. Note that this also ties in with sequence submissions as the archives have a controlled vocabularly in their schemas.

from hts-specs.

brainstorm avatar brainstorm commented on June 29, 2024

Nevermind, MGI doesn't seem to ship aligners. PL field most probably introduced on third party pipeline downstream erroneously.

from hts-specs.

zaeleus avatar zaeleus commented on June 29, 2024

Having an invalid field here does not invalidate the rest of the file.

A file can either be well-formed or not. I'm not sure why you would want or trust a somewhat valid file w.r.t. a specification, especially in the scientific domain.

We haven't updated the SAM version number when we've added extra fields here as the syntax is identical

Appending to a list of known values changes the syntax. PL:(CAPILLARY|DNBSEQ|etc) is not identical to PL:[ -~]+.

We feel there is still merit in having PL as a controlled vocabulary, as it gives vendors and users alike a clue as to what is expected. Without it we're highly likely to get ONT, OxfordNanopore, OxfordNanoporeTechnology, Oxford_Nanopore_Technology, etc.

In this case, the spec shouldn't define them as valid values but as suggested values.

from hts-specs.

jmarshall avatar jmarshall commented on June 29, 2024

The syntax of header lines is described in the first paragraph of §1.3: the fields are TAB-separated, and the line matches the regexp shown (notwithstanding the minor UTF-8-related issue you noted elsewhere). So regardless of whatever characters are in the PL field value, there is no difficulty parsing it: there is a TAB before the PL: and the value extends to (but does not include) either end-of-line or the next TAB, whichever comes first.

The list of keywords in the PL description is a list of semantically valid values. The syntax of the header line is unchanged when this list is appended to, as doing that does not affect parsing of that line or of the rest of that file or even (generally speaking) the interpretation of the rest of the file.

Note for example that ULTIMA was added to the spec when the SAM VN version number was already 1.6. Nonetheless a SAM file that says @HD VN:1.3 // @RG … PL:ULTIMA … is a perfectly valid SAM file. This is very intentional: parsing and understanding (the remainder of) the file is unaffected, so it would be silly for it to be invalid.

We've discussed this any number of times, e.g. on #454. I don't see any particular need to relitigate it, but if we do let's do it on a new open issue rather than a closed WONTFIX based on someone mistakenly specifying MGIG400 on their bwa command line.

from hts-specs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.