mvahowe / proskomma-js Goto Github PK

A JS Implementation of the Proskomma Scripture Processing Model

License: MIT License

JavaScript 99.81% XSLT 0.19%

proskomma-js's Issues

Add notNullable to schema

Right now any GraphQL node may return null. This doesn't break anything, but it would be more robust to require values that should never be null.

Force chapter/verses to main sequence

At present chapters and verse scopes may end up in headings or other secondary sequences. They should be forced to the main sequence. Other scopes may need this functionality, so we should come up with a generic way to make this happen.

Selectors for DocSets

Right now, importing a document requires a language code and an abbreviation. Documents with the same language code and abbreviation are added to the same docSet.

It is already clear that different Proskomma users will need different criteria for delimiting and filtering DocSets. Handling this involves

providing a way to specify which fields are required and allowed
providing a way to specify which required fields must match for documents within a docSet (ie when to create a new docSet)
exposing these fields within GraphQL so that, eg, it is possible to filter according to the ownership of the docSet

So Organisation X could define docSet membership by their own projectID, require language and country codes, and also allow start date and intended completion date. Organization Y could define docSet by language and abbreviation, also require the URL of the owner and also allow country code and the name of the project leader.

As always, the more options we offer the better... except that we then need to support all the permutations of those options. I propose

fields may be strings, integers, arrays of strings or arrays of integers.
scalar values (ie not arrays) must be non-null (or absent in the case of optional fields)
arrays must be present when required, but may be empty.
docSet membership must be delimited by one or more required, scalar fields.
docSets may be filtered by any combination of fields, using AND logic, where arrays are treated like lists of tags and may be required to include or not include particular values.
the field setup is specified when a Proskomma instance is constructed and may not be modified.

None of this "solves" the question of how to store metadata in general, but it seems sensible to have a specific way to handle the metadata that controls how basic entities within Proskomma are structured.

All comments welcome.

Move orphan open and close scopes to adjacent blocks

Finish implementing parse-time filtering

Right now the filtering framework exists, but with a stub inclusion test that returns true. (Changing this to false removes all grafts and scopes.) The intended include/exclude mechanism is waiting for unit tests.

Support colspan in USFM lexer

USFM 3 allows colspans to be specified with, eg

\tcr1-3

This will not work at present with the parser because dashes are not expected.

Publish as NPM

This will make use with React much simpler.

Tags for DocSets and Documents

Modify Tags via Mutation

Handle REM before content starts

Right now I think these must be grafted onto the previous heading, which is strictly what the markup suggests but not very useful in practice. This is probably something to fix at the tidy stage.

Handle introductions

Delete DocSet via Mutation

Check scope closing logic around tags with numbers

There are places where the base tag matters and places where the numbered tag matters. It looks like scope matching needs more of the numbered tag logic.

Extract book code from document header

Implement note internal markup

Rehash Enums

This is required before allowing content deletion (but would be useful for other reasons)

Support subclassing of Proskomma

Replace hardwired array and object defaults with instance variables
Test overriding these values in a subclass

Literal slashes in attribute values break parser

Remove 1 from first-level tags

One confusing USFM issue is that, eg, q and q1 mean the same thing. Right now all tags have a number, but this looks strange compared to "normal" USFM. The standard suggests that the number 1 should only be used if other numbers are used, but this is impossible to enforce without pre-scanning the entire document for usage. So I plan to remove the 1 systematically. (This can be revisited at the serialization stage if/when we output USFM.)

Test filtering of inline grafts

Finish USX lexer

EDIT: pubnumber and altnumber are now supported. This leaves

Add multiple documents without clearing preEnums

Check that multiple documents and docSets work

Add Documents via Mutation

Markdown Lexer

Block-level grafts

This is something that went in and out of the spec several times. I think we should put it back because, otherwise, it will be hard to edit a block while preserving grafts to headings attached to the start of that block.

Implement implicit tag closing

At present, we treat character-level tags as if they nest, as in HTML. According to the spec, in some cases opening a tag should close the preceding tag. This is most obvious in footnotes.

The parser_spec format should be able to handle this. The hardest bit will probably be unpicking exactly when this should happen.

Add \+ syntax to lexer and start/end tag preTokens
Make char tags produced by USX lexer use \+
Add endChar to parser and parser spec.

Rationalize GraphQL Filters

The basic idea is less fields and more optional arguments. So, eg, docSet with optional ids, selectors, withBook etc rather than docSetWithBook, docSetBySelectors etc.

Load succinct data

Chapter scopes not closing

Add way to get length of an nByte

nBytes (variable-length integers) are used extensively. In most cases they occur once, as the last part of a record, so there's no need to find what comes next. (There's a separate way to find the next record.) But, in the case of attributes, which may have many parts, we need to find the start of successive nBytes.

One option is to add a method that returns the length as well as the value.

Another, which may be simpler, is to calculate the length from the returned value.

Remove sequences with no blocks

The tidier currently removes blocks with no content. This can sometimes result in sequences with no content, so they should be removed too.

Remove empty blocks, with exceptions

Empty blocks should be removed - except when they are supposed to be empty, eg \b

Save succinct data

Limit maximum token chars length

Proskomma uses counted strings with a single-byte counter. The lexers should therefore split strings longer than 255 bytes into multiple tokens.

Fix whitespace for USFM with character-level markup

At present spacing is often missing between words. This is probably because the lexing regex is overly greedy.

Return Blocks by Scripture Reference

Implement \cp, \vp

Add table scopes

This should be done in the tidier.

Just-in-time unsuccinctify

printChapter => pubChapter

This is a stupid naming error that should be fixed to avoid confusion, but which will probably involve updating tests etc.

Reorder \v, \vp, \c, \cp

Parsing USFM in order produces

                "subType": "endScope",
                "label": "verses/14"
              },
              {
                "subType": "endScope",
                "label": "verse/14"
              },
              {
                "subType": "startScope",
                "label": "verse/15"
              },
              {
                "subType": "startScope",
                "label": "verses/15"
              },
              {
                "subType": "endScope",
                "label": "printVerse/1b"
              },
              {
                "subType": "startScope",
                "label": "printVerse/2b"
              }

It would be better if the printVerse endScope was before the verse endScopes.

USFM
USX

mvahowe / proskomma-js Goto Github PK

proskomma-js's People

Contributors

Stargazers

Watchers

Forkers

proskomma-js's Issues

Recommend Projects

Recommend Topics

Recommend Org