mvahowe / proskomma-js Goto Github PK
View Code? Open in Web Editor NEWA JS Implementation of the Proskomma Scripture Processing Model
License: MIT License
A JS Implementation of the Proskomma Scripture Processing Model
License: MIT License
Depends on #35
Right now any GraphQL node may return null. This doesn't break anything, but it would be more robust to require values that should never be null.
At present chapters and verse scopes may end up in headings or other secondary sequences. They should be forced to the main sequence. Other scopes may need this functionality, so we should come up with a generic way to make this happen.
Right now, importing a document requires a language code and an abbreviation. Documents with the same language code and abbreviation are added to the same docSet.
It is already clear that different Proskomma users will need different criteria for delimiting and filtering DocSets. Handling this involves
providing a way to specify which fields are required and allowed
providing a way to specify which required fields must match for documents within a docSet (ie when to create a new docSet)
exposing these fields within GraphQL so that, eg, it is possible to filter according to the ownership of the docSet
So Organisation X could define docSet membership by their own projectID, require language and country codes, and also allow start date and intended completion date. Organization Y could define docSet by language and abbreviation, also require the URL of the owner and also allow country code and the name of the project leader.
As always, the more options we offer the better... except that we then need to support all the permutations of those options. I propose
None of this "solves" the question of how to store metadata in general, but it seems sensible to have a specific way to handle the metadata that controls how basic entities within Proskomma are structured.
All comments welcome.
Right now the filtering framework exists, but with a stub inclusion test that returns true. (Changing this to false removes all grafts and scopes.) The intended include/exclude mechanism is waiting for unit tests.
USFM 3 allows colspans to be specified with, eg
\tcr1-3
This will not work at present with the parser because dashes are not expected.
This will make use with React much simpler.
Right now I think these must be grafted onto the previous heading, which is strictly what the markup suggests but not very useful in practice. This is probably something to fix at the tidy stage.
There are places where the base tag matters and places where the numbered tag matters. It looks like scope matching needs more of the numbered tag logic.
This is required before allowing content deletion (but would be useful for other reasons)
One confusing USFM issue is that, eg, q and q1 mean the same thing. Right now all tags have a number, but this looks strange compared to "normal" USFM. The standard suggests that the number 1 should only be used if other numbers are used, but this is impossible to enforce without pre-scanning the entire document for usage. So I plan to remove the 1 systematically. (This can be revisited at the serialization stage if/when we output USFM.)
EDIT: pubnumber and altnumber are now supported. This leaves
This is something that went in and out of the spec several times. I think we should put it back because, otherwise, it will be hard to edit a block while preserving grafts to headings attached to the start of that block.
At present, we treat character-level tags as if they nest, as in HTML. According to the spec, in some cases opening a tag should close the preceding tag. This is most obvious in footnotes.
The parser_spec format should be able to handle this. The hardest bit will probably be unpicking exactly when this should happen.
The basic idea is less fields and more optional arguments. So, eg, docSet with optional ids, selectors, withBook etc rather than docSetWithBook, docSetBySelectors etc.
nBytes (variable-length integers) are used extensively. In most cases they occur once, as the last part of a record, so there's no need to find what comes next. (There's a separate way to find the next record.) But, in the case of attributes, which may have many parts, we need to find the start of successive nBytes.
One option is to add a method that returns the length as well as the value.
Another, which may be simpler, is to calculate the length from the returned value.
The tidier currently removes blocks with no content. This can sometimes result in sequences with no content, so they should be removed too.
Empty blocks should be removed - except when they are supposed to be empty, eg \b
Proskomma uses counted strings with a single-byte counter. The lexers should therefore split strings longer than 255 bytes into multiple tokens.
At present spacing is often missing between words. This is probably because the lexing regex is overly greedy.
This should be done in the tidier.
This is a stupid naming error that should be fixed to avoid confusion, but which will probably involve updating tests etc.
Parsing USFM in order produces
"subType": "endScope",
"label": "verses/14"
},
{
"subType": "endScope",
"label": "verse/14"
},
{
"subType": "startScope",
"label": "verse/15"
},
{
"subType": "startScope",
"label": "verses/15"
},
{
"subType": "endScope",
"label": "printVerse/1b"
},
{
"subType": "startScope",
"label": "printVerse/2b"
}
It would be better if the printVerse endScope was before the verse endScopes.
This requires a second parsing of introductions, looking only at the block level, to break out grafts for introduction headings (and maybe other blocks) as necessary.
This is effectively how USFM3 section introductions work.
These turn out to present unique challenges since they can contain \p etc which normally goes to the "main" sequence. (In introductions, titles, etc, the markup is different.)
Related to #14
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.