Giter Club home page Giter Club logo

Comments (15)

strogonoff avatar strogonoff commented on August 16, 2024

We will not implement other types of legacy patterns because there are not generic. If we need to maintain compatibility we will have to store those patterns in the database. Perhaps an analysis of existing bib usage in RFCs/IDs is necessary.

Just in case, configuration should support more flexible generalised legacy path patterns soon.

  • If all varying parts of a legacy ref (which is in reference.{legacy_ref}.xml) map to Relaton data properties, it should work.
  • If not, then yes, a static map (either indeed database-based, or maybe better in configuration generated at build time) would be the way to go if required.

from bibxml-service.

strogonoff avatar strogonoff commented on August 16, 2024
  • Currently working pattern example: /public/rfc/bibxml6/reference.IEC_62531.2012_REDLINE.xml
  • I cannot factually reproduce legacy patterns from ticket description, because those standards don’t seem to exist in our bibxml-data-ieee
  • However, based on working pattern and other bibxml-data-ieee that we have, pattern 1 from ticket description (/public/rfc/bibxml6/reference.IEEE.802.3.1_2011.xml) would have been implemented as /public/rfc/bibxml6/reference.IEEE_802-3.1.2011.xml

This is because files in bibxml-data-ieee are named in this way, and therefore such are our canonical references.

As with other legacy paths, currently we can either:

  • rename source bibxml-data-ieee files to match expected legacy pattern, or
  • specify simple substitutional reformatting in legacy path pattern (replacing punctuation with dots, etc.)

from bibxml-service.

ronaldtse avatar ronaldtse commented on August 16, 2024
  1. The filename pattern of bibxml-data-ieee files are not important.
  2. relaton-data-ieee (and therefore bibxml-data-ieee) have a lot more content than the original bibxml6 directory. The original bibxml6 directory was manually crafted.

The only way to know if every single file from bibxml6 exists in bibxml-data-ieee, is through a search for every item.

Increasingly so I think this is the way to go. We should have a static "map" between the old dataset and the new dataset because the identifiers are too unpredictable...

from bibxml-service.

strogonoff avatar strogonoff commented on August 16, 2024
  1. The filename pattern of bibxml-data-ieee files are not important.
  2. relaton-data-ieee (and therefore bibxml-data-ieee) have a lot more content than the original bibxml6 directory. The original bibxml6 directory was manually crafted.

The only way to know if every single file from bibxml6 exists in bibxml-data-ieee, is through a search for every item.

Increasingly so I think this is the way to go. We should have a static "map" between the old dataset and the new dataset because the identifiers are too unpredictable...

@ronaldtse Possible miscommunication alert…

My previous comment was written under the assumption that legacy paths need to correspond to actual preexisting legacy data.

Today I realised it’s a mistaken assumption, as I remembered that per your comment before (ietf-ribose/bibxml-project#5 (comment)) you said legacy paths just need to maintain the patterns, and don’t need to correspond to actual data (because we aren’t expected to have that legacy data in our bibxml-data- datasets).

However, the above response from you in this thread seems to indicate we would need to map to legacy data after all—i.e. not only maintain the patterns but make sure old preexisting XML files from XML2RFC tools are accessible via the new service?

I wonder if this is still an open question, requirements-wise.

from bibxml-service.

ronaldtse avatar ronaldtse commented on August 16, 2024

I think this is a question we need to clarify. @rjsparks mentioned that we should support legacy paths for backwards compatibility reasons.

For true backwards compatibility, the data served by a given path should be the same -- however, there are two major differences that it won't make that much sense to do that:

  1. We are now using the RFC XML v3 format to serve BibXML data. Old implementations that read it will likely fail.
  2. There is more data per bibliographic file than the previously served files. This means that old implementations could also fail, and at the least, behave differently.

I would say that the intention is to provide:

  1. An identical reference. If the old path and old content points to IEEE 802.3a, then the legacy path (new system, old path) should lead to the same reference IEEE 802.3a
  2. The contents of the reference will differ. We will continue using RFC XML v3 instead of RFC XML v2 for the legacy paths, and accept that if the extra content in the response will cause implementation issues, the problem is at the implementation side.

In order to make a consistent map from legacy paths to the new dataset with 100% confidence that the paths are pointing to the same data, we will need to maintain possibly a "static map" from the legacy path item towards the new references.

@rjsparks is this the approach you're thinking of?

from bibxml-service.

rjsparks avatar rjsparks commented on August 16, 2024

As the RFP calls out, there are deployed tools that need to continue to work with the legacy paths. We will deploy the new service such that it either backs the legacy URLs directly, or will proxy or redirect to the new URLs, but the path structure should remain the same.

We do not need to replicate providing references using the v2 grammar - the references should be in the v3 format. I think all the known tools will do the reasonable things.

from bibxml-service.

rjsparks avatar rjsparks commented on August 16, 2024

Earlier there was discussion of a demo instance that we could poke at - is such a thing already available?
(edit) : nm - I relocated the server you've previously pointed to.

from bibxml-service.

rjsparks avatar rjsparks commented on August 16, 2024

So - to make this all a bit clearer, perhaps:

See, e.g., https://www.ietf.org/archive/id/draft-ietf-stir-messaging-01.xml
Note the many Processing Instructions that look like:

<!ENTITY RFC8174 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.8174.xml">

Other documents might point internally to https://xml2rfc.tools.ietf.org/public/rfc/bibxml-rfcs/reference.RFC.8174.xml.

When existing code processes this, it should still work. We will ensure that references to xml.resource.org (to the extent possible), xml2rfc.ietf.org, and xml2rfc.ietf.org still resolve, and will serve them from the product you are making. We want to be able to easily configure those redirects (or proxies) to point to the new correct place.

Further, please skim the code at:
https://trac.ietf.org/trac/xml2rfc/browser/trunk/cli/xml2rfc/parser.py#L56
https://trac.ietf.org/trac/xml2rfc/browser/trunk/cli/xml2rfc/parser.py#L468
These network_locs string should be all we have to replace to begin using the new service.

The code (with all the assumptions it makes about the structure of the URL below the configured network_locs) at
https://trac.ietf.org/trac/xml2rfc/browser/trunk/cli/xml2rfc/parser.py#L240
https://trac.ietf.org/trac/xml2rfc/browser/trunk/cli/xml2rfc/parser.py#L275
https://trac.ietf.org/trac/xml2rfc/browser/trunk/cli/xml2rfc/parser.py#L305
should work without modification (until we decide to improve/optimize using the API directly instead of these legacy paths).

v3 was designed to be backwards compatible with v2 - the references constructed shouldn't be so different that v2 processors would break on what's produced with v3 as an intended target. If you think you've identified a place where what you're constructing might, please provide an example.

from bibxml-service.

ronaldtse avatar ronaldtse commented on August 16, 2024

From #31:

Current bibxml reference: https://xml2rfc.tools.ietf.org/public/rfc/bibxml6/reference.IEEE.802.11_2012.xml
New bibxml legacy reference: http://34.229.41.119:8000/public/rfc/bibxml6/reference.IEEE_802-11.2012.xml

Note that, The file name is different: reference.IEEE.802.11_2012 vs reference.IEEE_802-11.2012.

@strogonoff there are two issues here.

Support a fuzzy legacy path match

In the IEEE legacy path, we want to support resolving the legacy pattern IEEE.802.11_2012.xml to the IEEE PubID-based entry of IEEE 802-11:2012.

Name the "anchor" attribute identical to the legacy path file name

When returning an IEEE legacy path BibXML XML output, we want to reflect the same "anchor".

Today the legacy path provides this output:
https://xml2rfc.tools.ietf.org/public/rfc/bibxml6/reference.IEEE.802.11_2012.xml

<reference anchor="IEEE.802.11_2012" target="http://ieeexplore.ieee.org...">

Notice that the legacy path IEEE.802.11_2012.xml shares an identical prefix to that of the anchor IEEE.802.11_2012. Presumably, an author will use IEEE.802.11_2012 inside the document to reference this particular bibliographic item.

This means that existing documents rely this anchor being identical to the file path, and thus we have to keep the anchor identical.

Technically this is poor practice due to the mixing of serving location and data identification, but this is an established practice in IETF authoring that is out of our scope to change.

from bibxml-service.

strogonoff avatar strogonoff commented on August 16, 2024

@ronaldtse Question: why should we support a fuzzy legacy match, instead of having a static mapping? It’s clear that legacy consumers must have exact filenames to start with. We just need to map legacy paths to up-to-date citations, whether automatically or not.

(Obviously, fuzzy match cannot be guaranteed to return correct results!)

For the second part (the anchor), it looks like you are suggesting altering the old XML contents and substituting paths? Are you sure it won’t break legacy consumers? If so, I think our best bet is to provide a GitHub source by crawling xml2rfc tools (which we might have to do anyway) and doing the requisite processing on XML as part of that crawl, rather than trying to manipulate this in realtime.

from bibxml-service.

ronaldtse avatar ronaldtse commented on August 16, 2024

@strogonoff because a fuzzy match is easier to maintain than a static-string to static-string match.

(Obviously, fuzzy match cannot be guaranteed to return correct results!)

Indeed, you are correct. There are clearly just two ways of handling legacy paths:

  1. Make a "legacy filename" to new "document identifier" mapping for all legacy paths. e.g. "reference.3GPP.XX.YY" => "3GPP XX.YY"
  2. Use a string matching pattern to map the legacy to new. It clearly doesn't work for all datasets (e.g. NIST dataset), but for certain ones that have legacy filenames defined consistently (e.g. 3GPP), it is possible.

For the second part (the anchor), it looks like you are suggesting altering the old XML contents and substituting paths

That's not what I'm saying.

I'm saying that:

  1. We will serve new content to the legacy paths. The whole point of handling legacy paths is to have the BibXML service provide old clients with up-to-date content.
  2. When serving the BibXML files, notice that the "filename" requested is identical to the anchor attribute within the BibXML file. This is the current practice of IETF authors and tooling, where they expect the "filename" to be identical to the anchor attribute.

from bibxml-service.

strogonoff avatar strogonoff commented on August 16, 2024

If we return new data from other sources for xml2rfc paths, then XML anchors will not match old xml2rfc anchors (presumably, the anchors in new data will contain some authoritative/canonical identifier, while the anchors in xml2rfc files match filenames that were arbitrarily assigned by humans).

So in addition to map or fuzzy-match, it still looks like we have to substitute anchors in XML based on whatever filename was in the incoming request (in case of legacy request, it would not match our anchor), on the fly. Unless I am misunderstanding you.

I’d rather get this clarified in case our fundamental legacy path handling requirements are jumping from “find and return” to something more like “find, parse, construct and return”.

from bibxml-service.

TonyLHansen avatar TonyLHansen commented on August 16, 2024

I think "find, parse, construct and return" might be required.

from bibxml-service.

strogonoff avatar strogonoff commented on August 16, 2024

Here’s a report for bibxml6 xml2rfc paths when “auto” resolution was in effect (results are not great, as diffs show):
bibxml6-report-with-auto.zip

Here’s a report for bibxml6 xml2rfc paths with current logic:
bibxml6-report-manual-only.zip

Current logic means most paths fall back to xml2rfc archive for now, returning identical XML to before, except these two which successfully map to these standards resulting in new XML:

XML diffs are available in HTML report in the second zip archive above.

@ronaldtse could you confirm that above mappings are right, just in case? If so, this can be closed.

from bibxml-service.

strogonoff avatar strogonoff commented on August 16, 2024

I think we need to either A) update mappings or B) wait until pubid-ieee gives finalized identifiers we can use in relaton-data-ieee (metanorma/pubid-ieee#72). I think we’d want to do (B), because otherwise we’ll need to switch mappings back and forth, but if it takes too long we should do (A) instead ASAP. (cc @ronaldtse)

from bibxml-service.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.