Giter Club home page Giter Club logo

Comments (19)

strogonoff avatar strogonoff commented on August 16, 2024

It already uses ietf-ribose/bibxml-data-<dataset id> and ietf-ribose/relaton-data-by default, withmain` branches. Documentation will reflect that.

from bibxml-service.

ronaldtse avatar ronaldtse commented on August 16, 2024

Are you sure? There is only bibxml-data-misc but no relaton-data-misc.

Screenshot 2021-12-21 at 8 27 41 AM

from bibxml-service.

strogonoff avatar strogonoff commented on August 16, 2024

from bibxml-service.

strogonoff avatar strogonoff commented on August 16, 2024

Updated ticket description.

from bibxml-service.

ronaldtse avatar ronaldtse commented on August 16, 2024

We could do a single run migration to convert the misc dataset into the Relaton format and get this done with.

But then we want to minimize the contents of the legacy misc dataset (ie replace its information with authoritative information). I'm just not sure if we have those data. So maybe the only reasonable approach now is to do a one time migration.

from bibxml-service.

strogonoff avatar strogonoff commented on August 16, 2024

We don’t actually need to index misc dataset properly, and it’s not a real “source“, so as we’ve determined we only need to return BibXML data for preexisting paths, so the task can be simplified to “one-time crawl legacy xml, map legacy filenames to new doctype/docids where possible, return new citation data where mapped with fallback to crawled data where not”.

from bibxml-service.

ronaldtse avatar ronaldtse commented on August 16, 2024

we only need to return BibXML data for preexisting paths

I believe we still need to provide the misc datasets data in the new service (not considering legacy paths).

from bibxml-service.

ronaldtse avatar ronaldtse commented on August 16, 2024

We will need to perform the following:

  1. The current misc dataset needs to be converted into a Relaton dataset called relaton-data-ietfmisc
  2. The bibxml-service needs to be able to handle the indexing of the relaton-data-ietfmisc and bibxml-data-ietfmisc datasets.

The misc dataset will no longer receive any updates. If we are able to find authoritative replacements for individual parts of its data, then we need to use the legacy path handling mechanism to redirect those identifiers to the proper dataset/data-item to return new information.

from bibxml-service.

strogonoff avatar strogonoff commented on August 16, 2024

Your previous comment implies that an xml2rfc dataset would be treated as an authoritative source. This is not in alignment with the plan, we should rehash the requirements ASAP.

My intention is to treat it as a source of filenames for legacy system compatibility mapping, and use file contents only as fallback for nonexistent mappings. This plan does not require creation of relaton-/bibxml-data repositories. (It does require us to get a full dump of xml2rfc, whether by crawling or by asking IETF for it.)

from bibxml-service.

strogonoff avatar strogonoff commented on August 16, 2024

I have described the updated implementation before, I think it’s mentioned in a ticket somewhere, might find it later

EDIT: filed https://github.com/ietf-ribose/bibxml-service/issues/49

from bibxml-service.

ronaldtse avatar ronaldtse commented on August 16, 2024

Your previous comment implies that an xml2rfc dataset would be treated as an authoritative source.

Unfortunately this is true for some of the content here. We do not have the full authoritative data for all entries in misc.

As we've discovered, the authoritative W3C dataset does not contain the particular entries described here. ITU has a plan but not yet made available their full bibliographic dataset. For ISO the dataset is again unavailable (I'll ask ISO IT).

So for the moment, it would be the best to treat this dataset as authoritative?

from bibxml-service.

ronaldtse avatar ronaldtse commented on August 16, 2024

This is done by @andrew2net , it's currently relaton-data-misc and bibxml-data-misc but we might rename them later.

And indexed:
Screenshot 2021-12-29 at 5 54 15 AM

from bibxml-service.

strogonoff avatar strogonoff commented on August 16, 2024

As we've discovered, the authoritative W3C dataset does not contain the particular entries described here.

Then we should have asked @andrew2net to use bibxml-misc as one of the sources when compiling relaton-data-w3c. My decision was to not merge these datasets, now I don’t know where we’re headed.

from bibxml-service.

ronaldtse avatar ronaldtse commented on August 16, 2024

@strogonoff we cannot merge these datasets because the W3C dataset is authoritative. The misc dataset is written by a third party (IETF) and is not validated by the authority.

from bibxml-service.

ronaldtse avatar ronaldtse commented on August 16, 2024

The current situation is correct.

from bibxml-service.

strogonoff avatar strogonoff commented on August 16, 2024

If it’s not authoritative then we clearly shouldn’t have that data as part of authoritative data.

The intended plan is to make W3C documents from bibxml-misc available using xml2rfc compatibility APIs without further contaminating authoritative data.

from bibxml-service.

ronaldtse avatar ronaldtse commented on August 16, 2024

@strogonoff it is possible for us to create an augmented w3c dataset that “corrects” the w3c dataset, ie. We move the W3C data from misc into the main W3C dataset. Then there is no “contamination”.

from bibxml-service.

strogonoff avatar strogonoff commented on August 16, 2024

@ronaldtse Presuming authoritative data is what we get from W3C, if W3C adds the new citations then that would be fine—but then it would come from W3C, not from bibxml-misc. Until then we shouldn’t expose such data, except for xml2rfc consumers who expect it. Anyway, to be discussed

from bibxml-service.

ronaldtse avatar ronaldtse commented on August 16, 2024

Presuming authoritative data is what we get from W3C
Yes.

if W3C adds the new citations then that would be fine—but then it would come from W3C, not from bibxml-misc.
Until then we shouldn’t expose such data, except for xml2rfc consumers who expect it.

The BibXML service is only used by xml2rfc consumers. As per the RFP, we need to make the data from the misc dataset "available".

There are only two options we can perform.

  1. As we do not have the full and complete authoritative dataset from W3C (W3C itself might not have it either), we still have to offer the data from the "misc" dataset.
  2. In our version of the W3C dataset, we might have to "augment" the data to provide items that are "published by W3C" but not authoritatively described by W3C (i.e. the manually curated bibliographic items from misc attributed to W3C).

Maybe the best way forward is to augment/supplement our W3C dataset to include the misc W3C data.

I would also like to remind ourselves that all other publisher sources in the misc dataset are in the same situation (ANSI, ISO, CCITT, etc).

from bibxml-service.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.