ietf-tools / bibxml-service Goto Github PK
View Code? Open in Web Editor NEWDjango-based Web service implementing IETF BibXML APIs
Home Page: https://bib.ietf.org
License: BSD 3-Clause "New" or "Revised" License
Django-based Web service implementing IETF BibXML APIs
Home Page: https://bib.ietf.org
License: BSD 3-Clause "New" or "Revised" License
If I select the "IEEE" dataset, then enter "IEEE C57-12-20.2011", it works:
But if I enter "C57-12-20.2011", nothing is found:
Having to require entering the dataset prefix ("IEEE") and then the document identifier with also the prefix "IEEE" seems redundant.
This seems to be an issue with the search engine matching strings.
W3C, ISO, ITU, ANSI, FIPS, CCITT [previous name of ITU], IEEE, OASIS, PKCS
W3C, ISO, ITU, FIPS are all supported by Relaton (the latter two especially are provided by the authoritative parties).
OASIS has indicated they are willing to provide bibdata (in a separate project, but the data can also be useful for IETF).
We will need to figure out CCITT documents (legacy docs from ITU).
PKCS has no authority now (they were published by RSA) so we can just move those content into a static dataset.
ANSI we will need to figure out.
Originally from ietf-ribose#10 (comment)
Requirements (to reiterate):
Adjusted legacy path implementation is as follows:
docid
pair in Relaton data)/public/rfc/bibxml9/reference.BCP.0004.xml
should be converted to docid like { "type": "IETF", "id": "IETF BCP 4" }
IETF Internet-Drafts (bibxml3
, bibxml-id
)
Legacy pattern(s) to implement:
We need to parse the pattern to return the appropriate BibXML content.
Originally planned to be done as part of bibxml-project, but now that we have only one codebase repository it may as well go here.
Currently the BibXML service does not provide an individual "show" page like the other datasets.
This is due to the fact where an "RFC subseries document" contains multiple RFCs. Each "RFC subseries document" contains individual metadata and also one or more RFCs, which form part of the "RFC subseries document".
This relationship is represented as a "document relation" in the Relaton data.
We need to handle this new data structure for display.
This is a moderately far out idea for now, at least as far as I’m concerned, but GraphQL API is an option that might be very feasible given current architecture.
It has its downsides (e.g., consumers may start depending on citation attributes even if as data structure may change as citation sourcing evolves; inconsistencies between data sources will become more obvious and irritating—some sources contain more data compared to the others, so a finer-grained query may unintentionally exclude citations; higher complexity; etc.), but also some upsides which may outweigh now or in near future (although I don’t think GQL should be made the primary supported API).
When search result from web UI has zero matches, show a message indicating that.
Example: https://demo.bibxml.org/search/?query=%2Bieee+foobar
Right now web UI gives an impression that it's still searching even though there are no matches.
We need to support the legacy path patterns for the following datasets.
bibxml-id
or bibxml3
bibxml-rfcsubseries
or bibxml9
Originally posted by @ronaldtse in ietf-ribose#7 (comment)
Here’s our API specification for BibXML service: openapi.yaml. The API is evolving and can change in the coming week or so, but is it overall correct / on the right track?
The API describes two endpoints:
/ref/
for retrieving a single standard’s metadata given dataset ID and standard reference.
/search/
for querying standards.
fields
, which for now simply matches provided values with whatever is in the index (e.g., { "fields": { "id": 1234, "doctype": "standard" } }
).dataset
field, without which search is performed across all datasets.limit
and offset
for windowing returned data.IANA references (bibxml8)
Legacy pattern(s) to implement:
IEEE
W3C
3GPP
IANA
NIST
RFC Subseries
IETF RFC
http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.NNNN.xml
IETF Internet-Draft
http://xml2rfc.ietf.org/public/rfc/bibxml-ids/reference.I-D.example-name.xml
http://xml2rfc.ietf.org/public/rfc/bibxml-ids/reference.I-D.draft-example-name-99.xml
IETF RFC Subseries
http://xml2rfc.ietf.org/public/rfc/bibxml-rfcsubseries/reference.BCP.0099.xml
http://xml2rfc.ietf.org/public/rfc/bibxml-rfcsubseries/reference.FYI.0099.xml
http://xml2rfc.ietf.org/public/rfc/bibxml-rfcsubseries/reference.STD.0099.xml
W3C
http://xml2rfc.ietf.org/public/rfc/bibxml4/reference.W3C.REC-example-name-date.xml
3GPP
http://xml2rfc.ietf.org/public/rfc/bibxml5/reference.SDO-3GPP.1234.xml
http://xml2rfc.ietf.org/public/rfc/bibxml-3gpp/reference.SDO-3GPP.1234.xml
IEEE
http://xml2rfc.ietf.org/public/rfc/bibxml6/reference.IEEE.12345_date.xml
http://xml2rfc.ietf.org/public/rfc/bibxml-ieee/reference.IEEE.12345_date.xml
From @strogonoff :
@yablokov says the data in Relaton YAML is not consistent across standard data repos
“Queue reindex”, “revoke task” buttons don’t work without JS, and the code is contained in <script> within HTML.
IETF RFCs (bibxml
)
Legacy pattern(s) to implement:
https://{hostname}/public/rfc/bibxml/reference.RFC.{NNNN}.xml
We need to parse NNNN
and return the appropriate RFC NNNN in BibXML format.
The (in testing) BibXML service is now deployed at:
The data sets of w3c, ieee, iana and nist are available.
The following data sets are still in progress:
The managing interface of the indexer is accessible at: https://demo.bibxml.org:8000. We will supply the method to login shortly.
IEEE (bibxml6)
Legacy pattern(s) to implement:
We will not implement other types of legacy patterns because there are not generic. If we need to maintain compatibility we will have to store those patterns in the database. Perhaps an analysis of existing bib usage in RFCs/IDs is necessary.
Misc collection (W3C, ISO, ITU, ANSI, FIPS, CCITT [previous name of ITU], IEEE, OASIS, PKCS) (bibxml2
)
Legacy pattern(s) to implement:
We need to parse the pattern to return the appropriate BibXML content.
We currently need two repositories per dataset. The relaton-data one for search indexing (as it provides more data) and the bibxml-data one for serving BibXML.
The latter dataset will no longer be needed once the service integrates the relaton-py library which converts Relaton data to BibXML on the fly. This is also necessary for serving other bibliographic formats like bibtex.
Once this is done we can also remove the unneeded dataset repos.
Originally posted by @ronaldtse in ietf-ribose#41 (comment)
Currently, task status/result persists in Redis, but if we want dataset indexing task history to be more reliable/exist for longer we should persist it in PostgreSQL (using django-celery-results
). This would also make it available using Django ORM, making it more convenient to query task status.
From @kesara:
The service must maintain the following backward compatibility with the existing service:
a. URL structure and file naming of the current web service. For example/public/rfc/bibxml/reference.RFC.7991.xml
. This will allow existing tools to quickly shift to using the new service.
b. For certain datasets (detailed below) the service must support a ‘live’ file name, which always serves the latest version of an XML citation at the time of retrieval, while also supporting the serving of specific versions. For example:reference.I-D.ietf-stir-passport-rcd.xml
will return the XML citation for the current version ofdraft-ietf-stir-passport-rcd
at the time of the request, whiledraft-ietf-stir-passport-rcd-09.xml
will always return the XML citation for version-09
of the Internet-Draft.
Originally posted by @kesara in ietf-ribose#6 (comment)
IANA references (bibxml-nist)
Legacy pattern(s) to implement:
We just have to do a {old-docid} mapping to the new IDs.
IETF BCP, FYI, STD (bibxml9
, bibxml-rfcsubseries
)
Legacy pattern(s) to implement:
We need to parse the pattern to return the appropriate BibXML content.
required for #13
From @yablokov:
In the API URL pattern, what identifiers do we support?
Right now, the bibxml patterns go like "rfcNNNN.xml".
Should we support:
would be nice to get more examples of a non-normalised identifiers.
Or may be better to clean these identifiers from non-standard characters and then try to normalize it?
Originally posted by @yablokov in ietf-ribose#3 (comment)
Currently, even if DOI returns 503, BibXML returns the “not found” response.
Discovered by cURLing a reference with 503 result, and trying it using BibXML service (running from the same IP) getting “not found” response.
From @yablokov :
Now available (bibxml-indexer):
http://127.0.0.1:8001/api/v1/indexer/<dataset_name>/run
http://127.0.0.1:8001/api/v1/indexer/<dataset_name>/stop
http://127.0.0.1:8001/api/v1/indexer/<dataset_name>/reset
http://127.0.0.1:8001/api/v1/indexer/<dataset_name>/status(as it described: https://github.com/ietf-ribose/bibxml-indexer/blob/master/openapi.yaml )
At indexer settings.py I have configured datasets:
- ecma
- nist
- ietf
- itu-r
- calconnect
- cie
- iso
- bipm
- iho
(get it from https://github.com/relaton?q=relaton-data-&type=&language=&sort= )
You can start indexation at bibxml-indexer instance:
http://127.0.0.1:8001/api/v1/indexer/ecma/runAnd read result from bibxml-service after indexation finish:
http://127.0.0.1:8000/api/v1/ref/ecma/ECMA-154You can start indexation at bibxml-indexer instance:
http://127.0.0.1:8001/api/v1/indexer/nist/runAnd read result from bibxml-service after indexation finish:
http://127.0.0.1:8000/api/v1/ref/nist/LCIRC288Repo/datasets at bibxml-indexer reads from configuration: https://github.com/ietf-ribose/bibxml-indexer/blob/master/indexer/settings.py
Currently, we hard-code “doi” in places, and route code to DOI retrieval function.
Instead, we should configure external datasets as a dictionary that maps external dataset ID to retrieval functions matching specific interface convention.
DOI references (bibxml7)
Legacy pattern(s) to implement:
Currently, they are fixed /public/rfc/{legacy_dataset_id}/reference.{ref}.xml
.
BibXML service should allow for /public/rfc/{legacy_dataset_id}/{arbitrary_prefix}{ref}.xml
.
By default, arbitrary_prefix="reference."
.
W3C (bibxml4)
Legacy pattern(s) to implement:
We need to parse the pattern to return the appropriate BibXML content.
Current bibxml reference: https://xml2rfc.tools.ietf.org/public/rfc/bibxml6/reference.IEEE.802.11_2012.xml
New bibxml legacy reference: http://34.229.41.119:8000/public/rfc/bibxml6/reference.IEEE_802-11.2012.xml
Note that,
The file name is different:
reference.IEEE.802.11_2012
vs reference.IEEE_802-11.2012
.
The reference
anchor
attribute data is different:
anchor="IEEE.802.11_2012"
vs anchor="IEEE.IEEE 802-11.2012"
The organization
data is different:
<organization>IEEE</organization>
vs <organization abbrev="IEEE">Institute of Electrical and Electronics Engineers</organization>
These should match the existing bibxml service references.
The config file should be simple to use. Right now it mixes many different concepts and makes it hard to understand/use/config.
For example, I can't tell the difference between these:
DATASET_SOURCE_OVERRIDES
AUTHORITATIVE_DATASETS
EXTERNAL_DATASETS
KNOWN_DATASETS
LEGACY_DATASETS
If I just want to do #40 or #41 , what do I do? The file does not answer this.
10.6028/NIST.IR.7057
(this is a valid link: http://doi.org/10.6028/NIST.IR.7057)See it being redirected to https://demo.bibxml.org/doi/10.6028%252FNIST.IR.7057/
and a failure message:
Requested reference not found: 10.6028/NIST.IR.7057
Example: https://demo.bibxml.org/api/v1/ref/ieee/IEEE_628.2020/?format=bibxml
Caused by a refactor that introduced get_indexed_ref_by_query
, and caller forgetting to pass the “format” argument through to it.
3GPP (bibxml5)
Legacy pattern(s) to implement:
3GPP documents are of the pattern like:
We need to parse the pattern to return the appropriate BibXML content.
@kwkwan These annotations are not necessary? Can we remove them?
“Misc” dataset is an old, manually crafted xml2rfc dataset that contains citations of various doctypes and docids, for some (or all) of which newer citation metadata is contained in other sources that we have.
We shouldn’t index “misc”, but we should allow compatibility API (legacy xml2rfc paths) to resolve to doctype/docids (both dynamically by parsing filename, and via manual assignment); and then return corresponding new citation metadata indexed from those datasets, with fallback to pre-crawled data from xml2rfc webserver (for dynamic resolution yields unknown doctype/docid and manual assignment was not provided).
(Similar handling is planned to be applied to other xml2rfc data as well.)
In the latest bibxml4 dataset, there are two patterns of reference files:
reference.SDO-3GPP.*.xml
reference.3GPP.*.xml
We will need to build legacy file mappings for both, but I wanted to find out if the content really differs between them.
Since we already the anchors are different (because the anchors follow the filenames), we will omit the difference in anchors:
$ sed -i'.bak' 's#SDO-##g' reference.SDO-3GPP.*.xml
$ find . -name 'reference.SDO-3GPP.*.xml' -exec bash -c 'diff $0 ${0/SDO-/}' {} \;
The collection of files are different in cardinality:
$ ls -l reference.3GPP.*.xml | wc -l
2217
$ ls -l reference.SDO-3GPP.*.xml | wc -l
2110
So the pattern reference.3GPP.*.xml
contains 117 files that reference.SDO-3GPP.*.xml
does not have.
The speculation is that the reference.SDO-3GPP.*.xml
pattern contains older data than reference.3GPP.*.xml
.
The actual differences between the files are:
5c5
< <title>Voice Broadcast Service (VBS); Stage 2</title>
---
> <title>Voice Broadcast service (VBS); Stage 2</title>
5c5
< <title>Location Services (LCS); Serving Mobile Location Centre - Serving Mobile Location Centre (SMLC - SMLC); SMLCPP specification</title>
---
> <title>Location Services (LCS): Serving Mobile Location Centre - Serving Mobile Location Centre (SMLC - SMLC); SMLCPP specification</title>
5c5
< <title>Customised Applications for Mobile network Enhanced Logic (CAMEL) Phase X; CAMEL Application Part (CAP) specification</title>
---
> <title>Customized Applications for Mobile network Enhanced Logic (CAMEL) Phase X; CAMEL Application Part (CAP) specification</title>
5c5
< <title>3G Security; Specification of the MILENAGE algorithm set: An example algorithm set for the 3GPP authentication and key generation functions f1, f1*, f2, f3, f4, f5 and f5*; Document 3: Implementors' test data</title>
---
> <title>3G Security; Specification of the MILENAGE algorithm set: An example algorithm set for the 3GPP authentication and key generation functions f1, f1*, f2, f3, f4, f5 and f5*; Document 3: Implementors’ test data</title>
5c5
< <title>Customised Applications for Mobile network Enhanced Logic (CAMEL); Service description; Stage 1</title>
---
> <title>Customized Applications for Mobile network Enhanced Logic (CAMEL); Service description; Stage 1</title>
5c5
< <title>3G security; LawfulInterception; Stage 2</title>
---
> <title>3G security; Lawful Interception; Stage 2</title>
5c5
< <title>IP Multimedia Subsystem (IMS) Application Level Gateway (IMS-ALG) - IMS Access Gateway (IMS-AGW) interface: Procedures descriptions</title>
---
> <title>IP Multimedia Subsystem (IMS) Application Level Gateway (IMS-ALG) – IMS Access Gateway (IMS-AGW) interface: Procedures descriptions</title>
5c5
< <title>Customised Applications for Mobile network Enhanced Logic (CAMEL) Phase 4; Stage 2; IM CN Interworking</title>
---
> <title>Customized Applications for Mobile network Enhanced Logic (CAMEL) Phase 4; Stage 2; IM CN Interworking</title>
5c5
< <title>Customised Applications for Mobile network Enhanced Logic (CAMEL) Phase 4; Stage 2</title>
---
> <title>Customized Applications for Mobile network Enhanced Logic (CAMEL) Phase 4; Stage 2</title>
5c5
< <title>Mobile radio interface layer 3 specification; Radio Resource Control (RRC) protocol; Iu mode</title>
---
> <title>Mobile radio interface layer 3 specification, Radio Resource Control (RRC) protocol; Iu mode</title>
5c5
< <title>Telecommunication management; Self-configuration of network elements Integration Reference Point (IRP); Solution Set (SS) definitions</title>
---
> <title>Telecommunication management; Self-Configuration of Network Elements Integration Reference Point (IRP); Solution Set (SS) definitions</title>
5c5
< <title>TISPAN; PSTN/ISDN simulation services Terminating Identification Presentation (TIP) and Terminating Identification Restriction (TIR); Protocol specification</title>
---
> <title>PSTN/ISDN simulation services Terminating Identification Presentation (TIP) and Terminating Identification Restriction (TIR); Protocol specification</title>
5c5
< <title>Telecommunication management; Generic Integration Reference Point (IRP) management; Solution Set (SS) definitions</title>
---
> <title>Telecommunication management; Generic Integration Reference Point (IRP) management; Solution Set (SS) Definitions</title>
5c5
< <title>3G Security; Lawful Interception; Stage 2</title>
---
> <title>Lawful Interception; Stage 2</title>
Which aren't many.
We can immediately pick up those minor differences:
-
vs –
In all cases, the pattern reference.3GPP.*.xml
contains content that are more correct
I propose that we make these assumptions:
reference.3GPP.*.xml
and reference.SDO-3GPP.*.xml
are identical, and hence the content of reference.3GPP.*.xml
and reference.SDO-3GPP.*.xml
are meant to be identicalreference.3GPP.*.xml
files that do not have a corresponding reference.SDO-3GPP.*.xml
file, when asked for that reference.SDO-3GPP.*.xml
file, we can respond with the content of the reference.3GPP.*.xml
file.@rjsparks is that acceptable?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.