<a href="https://github.com/ietf-ribose/bibxml-service/blob/d64b54e511c46fb665e819d455

we only need to return BibXML data for preexisting paths </blockquote

We will need to perform the following: The current <code class

Convert 'misc' dataset into relaton-data-ietfmisc and bibxml-data-ietfmisc about bibxml-service HOT 19 CLOSED

ietf-tools commented on August 16, 2024

Convert 'misc' dataset into relaton-data-ietfmisc and bibxml-data-ietfmisc

from bibxml-service.

Comments (19)

strogonoff commented on August 16, 2024

It already uses ietf-ribose/bibxml-data-<dataset id> and ietf-ribose/relaton-data-by default, withmain` branches. Documentation will reflect that.

from bibxml-service.

ronaldtse commented on August 16, 2024

Are you sure? There is only bibxml-data-misc but no relaton-data-misc.

from bibxml-service.

strogonoff commented on August 16, 2024

We can only index data in Relaton, and xml2rfc datasets are treated as aliases (just with custom path handling) to “proper” datasets (which must have both bibxml- and relaton-data). The situation with misc (and also I-Ds/Datatracker) shows that we need to handle data sources a bit differently. Legacy path handling that works without indexing and accompanying Relaton data is addressed in a larger upcoming dataset source restructuring. That restructuring will also likely eliminate the need for some bibxml-data repositories, since we can generate BibXML from Relaton on our own (which has not been the case due to mistaken assumptions that (1) IETF legacy API requires xml2rfc old XML verbatim, which it does not, and (2) bibxml-data uses xml2rfc as sources and preserves that verbatim XML, which it does not).

…

On 21 Dec 2021, at 1:28 AM, Ronald Tse ***@***.***> wrote: Are you sure? There is only bibxml-data-misc but no relaton-data-misc. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you modified the open/close state.

from bibxml-service.

strogonoff commented on August 16, 2024

Updated ticket description.

from bibxml-service.

ronaldtse commented on August 16, 2024

We could do a single run migration to convert the misc dataset into the Relaton format and get this done with.

But then we want to minimize the contents of the legacy misc dataset (ie replace its information with authoritative information). I'm just not sure if we have those data. So maybe the only reasonable approach now is to do a one time migration.

from bibxml-service.

strogonoff commented on August 16, 2024

We don’t actually need to index misc dataset properly, and it’s not a real “source“, so as we’ve determined we only need to return BibXML data for preexisting paths, so the task can be simplified to “one-time crawl legacy xml, map legacy filenames to new doctype/docids where possible, return new citation data where mapped with fallback to crawled data where not”.

from bibxml-service.

ronaldtse commented on August 16, 2024

we only need to return BibXML data for preexisting paths

I believe we still need to provide the misc datasets data in the new service (not considering legacy paths).

from bibxml-service.

ronaldtse commented on August 16, 2024

We will need to perform the following:

The current misc dataset needs to be converted into a Relaton dataset called relaton-data-ietfmisc
The bibxml-service needs to be able to handle the indexing of the relaton-data-ietfmisc and bibxml-data-ietfmisc datasets.

The misc dataset will no longer receive any updates. If we are able to find authoritative replacements for individual parts of its data, then we need to use the legacy path handling mechanism to redirect those identifiers to the proper dataset/data-item to return new information.

from bibxml-service.

strogonoff commented on August 16, 2024

Your previous comment implies that an xml2rfc dataset would be treated as an authoritative source. This is not in alignment with the plan, we should rehash the requirements ASAP.

My intention is to treat it as a source of filenames for legacy system compatibility mapping, and use file contents only as fallback for nonexistent mappings. This plan does not require creation of relaton-/bibxml-data repositories. (It does require us to get a full dump of xml2rfc, whether by crawling or by asking IETF for it.)

from bibxml-service.

strogonoff commented on August 16, 2024

I have described the updated implementation before, I think it’s mentioned in a ticket somewhere, might find it later

EDIT: filed https://github.com/ietf-ribose/bibxml-service/issues/49

from bibxml-service.

ronaldtse commented on August 16, 2024

Your previous comment implies that an xml2rfc dataset would be treated as an authoritative source.

Unfortunately this is true for some of the content here. We do not have the full authoritative data for all entries in misc.

As we've discovered, the authoritative W3C dataset does not contain the particular entries described here. ITU has a plan but not yet made available their full bibliographic dataset. For ISO the dataset is again unavailable (I'll ask ISO IT).

So for the moment, it would be the best to treat this dataset as authoritative?

from bibxml-service.

ronaldtse commented on August 16, 2024

This is done by @andrew2net , it's currently relaton-data-misc and bibxml-data-misc but we might rename them later.

And indexed:

from bibxml-service.

strogonoff commented on August 16, 2024

As we've discovered, the authoritative W3C dataset does not contain the particular entries described here.

Then we should have asked @andrew2net to use bibxml-misc as one of the sources when compiling relaton-data-w3c. My decision was to not merge these datasets, now I don’t know where we’re headed.

from bibxml-service.

ronaldtse commented on August 16, 2024

@strogonoff we cannot merge these datasets because the W3C dataset is authoritative. The misc dataset is written by a third party (IETF) and is not validated by the authority.

from bibxml-service.

ronaldtse commented on August 16, 2024

The current situation is correct.

from bibxml-service.

strogonoff commented on August 16, 2024

If it’s not authoritative then we clearly shouldn’t have that data as part of authoritative data.

The intended plan is to make W3C documents from bibxml-misc available using xml2rfc compatibility APIs without further contaminating authoritative data.

from bibxml-service.

ronaldtse commented on August 16, 2024

@strogonoff it is possible for us to create an augmented w3c dataset that “corrects” the w3c dataset, ie. We move the W3C data from misc into the main W3C dataset. Then there is no “contamination”.

from bibxml-service.

strogonoff commented on August 16, 2024

@ronaldtse Presuming authoritative data is what we get from W3C, if W3C adds the new citations then that would be fine—but then it would come from W3C, not from bibxml-misc. Until then we shouldn’t expose such data, except for xml2rfc consumers who expect it. Anyway, to be discussed

from bibxml-service.

ronaldtse commented on August 16, 2024

Presuming authoritative data is what we get from W3C
Yes.

if W3C adds the new citations then that would be fine—but then it would come from W3C, not from bibxml-misc.
Until then we shouldn’t expose such data, except for xml2rfc consumers who expect it.

The BibXML service is only used by xml2rfc consumers. As per the RFP, we need to make the data from the misc dataset "available".

There are only two options we can perform.

As we do not have the full and complete authoritative dataset from W3C (W3C itself might not have it either), we still have to offer the data from the "misc" dataset.
In our version of the W3C dataset, we might have to "augment" the data to provide items that are "published by W3C" but not authoritatively described by W3C (i.e. the manually curated bibliographic items from misc attributed to W3C).

Maybe the best way forward is to augment/supplement our W3C dataset to include the misc W3C data.

I would also like to remind ourselves that all other publisher sources in the misc dataset are in the same situation (ANSI, ISO, CCITT, etc).

from bibxml-service.

Convert 'misc' dataset into relaton-data-ietfmisc and bibxml-data-ietfmisc about bibxml-service HOT 19 CLOSED

Comments (19)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent