Giter Club home page Giter Club logo

proskomma-js's People

Contributors

bhalbright avatar danielc-n avatar imad-hamzi avatar mandolyte avatar mvahowe avatar superdav42 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

proskomma-js's Issues

Add notNullable to schema

Right now any GraphQL node may return null. This doesn't break anything, but it would be more robust to require values that should never be null.

  • index
  • doc_set
  • document
  • sequence
  • block
  • item
  • token
  • graft
  • scope
  • key_value

Force chapter/verses to main sequence

At present chapters and verse scopes may end up in headings or other secondary sequences. They should be forced to the main sequence. Other scopes may need this functionality, so we should come up with a generic way to make this happen.

Selectors for DocSets

Right now, importing a document requires a language code and an abbreviation. Documents with the same language code and abbreviation are added to the same docSet.

It is already clear that different Proskomma users will need different criteria for delimiting and filtering DocSets. Handling this involves

  • providing a way to specify which fields are required and allowed

  • providing a way to specify which required fields must match for documents within a docSet (ie when to create a new docSet)

  • exposing these fields within GraphQL so that, eg, it is possible to filter according to the ownership of the docSet

So Organisation X could define docSet membership by their own projectID, require language and country codes, and also allow start date and intended completion date. Organization Y could define docSet by language and abbreviation, also require the URL of the owner and also allow country code and the name of the project leader.

As always, the more options we offer the better... except that we then need to support all the permutations of those options. I propose

  • fields may be strings, integers, arrays of strings or arrays of integers.
  • scalar values (ie not arrays) must be non-null (or absent in the case of optional fields)
  • arrays must be present when required, but may be empty.
  • docSet membership must be delimited by one or more required, scalar fields.
  • docSets may be filtered by any combination of fields, using AND logic, where arrays are treated like lists of tags and may be required to include or not include particular values.
  • the field setup is specified when a Proskomma instance is constructed and may not be modified.

None of this "solves" the question of how to store metadata in general, but it seems sensible to have a specific way to handle the metadata that controls how basic entities within Proskomma are structured.

All comments welcome.

Finish implementing parse-time filtering

Right now the filtering framework exists, but with a stub inclusion test that returns true. (Changing this to false removes all grafts and scopes.) The intended include/exclude mechanism is waiting for unit tests.

Support colspan in USFM lexer

USFM 3 allows colspans to be specified with, eg

\tcr1-3

This will not work at present with the parser because dashes are not expected.

Handle REM before content starts

Right now I think these must be grafted onto the previous heading, which is strictly what the markup suggests but not very useful in practice. This is probably something to fix at the tidy stage.

Rehash Enums

This is required before allowing content deletion (but would be useful for other reasons)

Remove 1 from first-level tags

One confusing USFM issue is that, eg, q and q1 mean the same thing. Right now all tags have a number, but this looks strange compared to "normal" USFM. The standard suggests that the number 1 should only be used if other numbers are used, but this is impossible to enforce without pre-scanning the entire document for usage. So I plan to remove the 1 systematically. (This can be revisited at the serialization stage if/when we output USFM.)

Finish USX lexer

EDIT: pubnumber and altnumber are now supported. This leaves

  • lemma and strong in char
  • ms (milestones)
  • sidebar
  • periph
  • figure
  • optbreak

Block-level grafts

This is something that went in and out of the spec several times. I think we should put it back because, otherwise, it will be hard to edit a block while preserving grafts to headings attached to the start of that block.

Implement implicit tag closing

At present, we treat character-level tags as if they nest, as in HTML. According to the spec, in some cases opening a tag should close the preceding tag. This is most obvious in footnotes.

The parser_spec format should be able to handle this. The hardest bit will probably be unpicking exactly when this should happen.

  • Add \+ syntax to lexer and start/end tag preTokens
  • Make char tags produced by USX lexer use \+
  • Add endChar to parser and parser spec.

Rationalize GraphQL Filters

The basic idea is less fields and more optional arguments. So, eg, docSet with optional ids, selectors, withBook etc rather than docSetWithBook, docSetBySelectors etc.

Add way to get length of an nByte

nBytes (variable-length integers) are used extensively. In most cases they occur once, as the last part of a record, so there's no need to find what comes next. (There's a separate way to find the next record.) But, in the case of attributes, which may have many parts, we need to find the start of successive nBytes.

One option is to add a method that returns the length as well as the value.

Another, which may be simpler, is to calculate the length from the returned value.

Remove sequences with no blocks

The tidier currently removes blocks with no content. This can sometimes result in sequences with no content, so they should be removed too.

Limit maximum token chars length

Proskomma uses counted strings with a single-byte counter. The lexers should therefore split strings longer than 255 bytes into multiple tokens.

printChapter => pubChapter

This is a stupid naming error that should be fixed to avoid confusion, but which will probably involve updating tests etc.

Reorder \v, \vp, \c, \cp

Parsing USFM in order produces

                "subType": "endScope",
                "label": "verses/14"
              },
              {
                "subType": "endScope",
                "label": "verse/14"
              },
              {
                "subType": "startScope",
                "label": "verse/15"
              },
              {
                "subType": "startScope",
                "label": "verses/15"
              },
              {
                "subType": "endScope",
                "label": "printVerse/1b"
              },
              {
                "subType": "startScope",
                "label": "printVerse/2b"
              }

It would be better if the printVerse endScope was before the verse endScopes.

Intro heading grafts

This requires a second parsing of introductions, looking only at the block level, to break out grafts for introduction headings (and maybe other blocks) as necessary.

Sidebars

These turn out to present unique challenges since they can contain \p etc which normally goes to the "main" sequence. (In introductions, titles, etc, the markup is different.)

Related to #14

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.