bdarcus / csl-next Goto Github PK

View Code? Open in Web Editor NEW

7.0 7.0 0.0 212 KB

An experimental reimagining of CSL

License: Mozilla Public License 2.0

TypeScript 100.00%

json-schema typescript yaml

csl-next's People

Contributors

Stargazers

Watchers

csl-next's Issues

Add ContributorsGroup

If dates can work via parameters, perhaps contributors can also, at least as default behavior?

This is a newer feature in JS for more general list formatting.

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl/ListFormat

const vehicles = ['Motorcycle', 'Bus', 'Car'];

const formatter = new Intl.ListFormat('en', { style: 'long', type: 'conjunction' });
console.log(formatter.format(vehicles));
// Expected output: "Motorcycle, Bus, and Car"

For us, type would always be "conjunction", and so just a Boolean.

Contributors is just a generic list, then, so can use generic parameters for that, plus something for personal names, and labels.

contributors:
  andAs: symbol
  nameAsSort: first
  labelWrap: parentheses

So I actually think we're covered here; the JS code just might be useful for implementation.

Style is:

long
short
narrow

Project config

I'm finding the node/js ecosystem overwhelming, but am thinking I may want to standardize on newer features (like ES modules) and build packages (esbuild), since this is new, and experimental.

This starter kit might be helpful for that; it sets up a build system based on esbuild, adds github CI, linting, etc., along with a working app with example src tree organization.

https://github.com/FreekMencke/node-ts-starter-cli

Maybe use rome instead of prettier and eslint though?

❯ npx node-ts-starter-cli create csl-next-ts -g -l -p

So it should be easy to adapt the current source structure to this, perhaps keeping a version of the basic app that can improve as the code gets implemented. E.g.:

return the JSON input bibliiography
process 1 and return author and title
#2
etc

Maybe along with this (or maybe bun?) for running ts directly?

https://www.npmjs.com/package/tsx

Here's the src tree it creates:

❯ tree src
src
├── app
│   ├── app.ts
│   └── common
│       └── logger.ts
├── config
│   └── config.ts
├── main.ts
└── typings
    └── typings.d.ts

{
  "name": "csl-next.js",
  "version": "0.0.1",
  "license": "MIT",
  "scripts": {
    "build": "node ./build/esbuild.js --dev",
    "build:meta": "node ./build/esbuild.js --dev --meta",
    "build:meta:prod": "node ./build/esbuild.js --meta",
    "build:prod": "node ./build/esbuild.js",
    "lint": "rome check . .ts,.js",
    "format": "rome format --write .",
    "format:ci": "rome ci .",
    "start": "node ./build/esbuild.js --dev --watch --run",
    "start:ci": "node ./build/esbuild.js --run",
    "start:prod": "node ./build/esbuild.js --watch --run"
  },
  "prettier": "./.prettierrc.json",
  "devDependencies": {
    "@es-exec/esbuild-plugin-start": "^0.0.4",
    "@types/node": "^18.14.1",
    "@typescript-eslint/eslint-plugin": "^5.53.0",
    "@typescript-eslint/parser": "^5.53.0",
    "edtf": "^4.4.1",
    "esbuild": "^0.17.10",
    "minimist": "^1.2.8",
    "rome": "^12.0.0",
    "typescript": "^4.9.5"
  }
}

node alternatives

Premature to worry about ATM, but the two bigs ones recently are deno and bun.

Both support typescript out-of-box, and both prioritize performance.

The former is different enough, however, that I don't see it likely worth worrying about any compatibility, at least not for awhile.

But bun aims to be a drop-in replacement for node. And it is extremely fast.

As for compatibility, this currently errors:

❯ bun run ./build/esbuild.js --dev --watch --run
✘ [ERROR] Expected value for define "VERSION" to be a string, got undefined instead

But this runs fine.

❯ bun run src/main.ts

It's premature for the processor prototype, since it doesn't do anything, but docs can provide a useful view of the models. For that reason, should integrate a make option that generates docs for style, citation, bibliography.

I find typedoc pretty nice; example usage here:

typedoc src/style.ts src/citation.ts src/bibliography

Here's a screenshot of it's output.

See config options.

There are extensions too, like this one:

https://www.npmjs.com/package/typedoc-umlclass

Also, not sure what to make of this:

❯ typedoc src/style.ts
[warning] AffixType, defined in ./src/style.ts, is referenced by Group.affixes but not included in the documentation.
[warning] TemplateModel, defined in ./src/style.ts, is referenced by NamedTemplate.template but not included in the documentation.
[warning] SubstitutionType, defined in ./src/style.ts, is referenced by OptionGroup.substitute but not included in the documentation.
[warning] GroupAffixLevel, defined in ./src/style.ts, is referenced by RefList.groupAffixLevel but not included in the documentation.
[warning] Bibliography, defined in ./src/style.ts, is referenced by Style.bibliography but not included in the documentation.
[warning] CategoryType, defined in ./src/style.ts, is referenced by Style.categories but not included in the documentation.
[warning] Citation, defined in ./src/style.ts, is referenced by Style.citation but not included in the documentation.

Locator notes

This description of how the org-cite CSL processor parses strings to extract lists of locators is excellent.

Seems to have derived from citeproc-org.

We should add it somewhere to CSL, if we haven't already.

;; CSL styles recognize "locator" in citation references' suffix.  For example,
;; in the citation
;;
;;     [cite:see @Tarski-1965 chapter 1, for an example]
;;
;; "chapter 1" is the locator.  The whole citation is rendered as
;;
;;     (see Tarski 1965, chap. 1 for an example)
;;
;; in the default CSL style.
;;
;; The locator starts with a locator term, among "bk.", "bks.", "book", "chap.",
;; "chaps.", "chapter", "col.", "cols.", "column", "figure", "fig.", "figs.",
;; "folio", "fol.", "fols.", "number", "no.", "nos.", "line", "l.", "ll.",
;; "note", "n.", "nn.", "opus", "op.", "opp.", "page", "p.", "pp.", "paragraph",
;; "para.", "paras.", "¶", "¶¶", "§", "§§", "part", "pt.", "pts.", "section",
;; "sec.", "secs.", "sub verbo", "s.v.", "s.vv.", "verse", "v.", "vv.",
;; "volume", "vol.", and "vols.".  It ends with the last comma or digit in the
;; suffix, whichever comes last, or runs till the end of the suffix.
;;
;; The part of the suffix before the locator is appended to reference's prefix.
;; If no locator term is used, but a number is present, then "page" is assumed.

Consistently use class, type, interface

Generally, I've settled on:

use interface for input-only data modeling
use type for the same, where interface won't work (like the index signature for InputReference)
use class for implementing processing logic

Also, set default values for all parameter options, and maybe interface fields.

https://www.typescripttutorial.net/typescript-tutorial/typescript-default-parameters/

https://bobbyhadz.com/blog/typescript-interface-default-values
https://timmousk.com/blog/typescript-interface-default-value/

This is currently inconsistently implemented.

Integrate Ajv

I added AJV, intending to use it for validation of the input.

But it looks like it may have much more:

https://ajv.js.org/guide/typescript.html

Makefile could use some attention

clean needs to be expanded
confirm it all works as it should, and fix if not

Priorities, milestones

As preface, I've been experimenting in parallel with a Rust based implementation here:

https://github.com/bdarcus/csln

I'm leaning towards prioritizing that going forward, but plan to align the models. But the success of either project will depend on contributions from others, so we'll see how they go.

Status

At this point, I'm confident in the direction of the basic style model, though the details are in need of wider review and testing.

I have, however, checked my assumptions against what I'm able to glean from the existing style repository using ripgrep.

Doing so shows that in the areas where I've moved logic from templates/macros to parameters, styles show a lot of duplication, which suggests in retrospect that level of control is not needed in the template language itself.

Milestones

So I've set up some obvious tentative milestones, and am already ahead of that schedule.

Ideally if this works out, I can transfer this project to the CSL GitHub org, and it can be team managed and developed further.

Contributing

If you are interested in potentially submitting a PR, let me know before you start; I've been somewhat liberal about rewriting the git history of the main branch so far (though I've avoided this more recently, as things are looking more stable)!

PS - having no previous experience with the js/node ecosystem, I've just opted to use tools here that bring me joy 😊

They're generally easy-to-use, with good UIs, high performance, and minimal dependencies.

Notes on quicktype

Quicktype seems to do a pretty good job of generating model code from JSON schemas to different languages, most notably Rust, Haskell, and Go.

It even includes parsing code in the generated code!

But I just realized it also has "experimental" support for direct conversion from typescript.

To assess, then, typescript-json-schema vs quicktype for:

the JSON schemas
other code

Basically, it may be possible to remove the first, and standardize on the second.

I'm trying to find some sort of library or tool that can programmatically evaluate schema quality and performance, but have not had any luck so far.

A quick grep for "$ref" shows equivalent numbers, but numbers for metadata properties are higher for quicktype. Need to check what that means.

Both validate the same.

But tjs has lots of annotation options to customize output. Does quicktype?

chore: add CI

Adapted some ideas from node-ts-starter-cli.

PS - been experimenting with lefthook for commit hooks. This matches the CI, and formats staged files:

# lefthook.yml
# lefthook add pre-commit

pre-commit:
  commands:
    format:
      glob: "*.ts"
      run: npx rome format --write {staged_files}

Add property to named template for substitution

Follow-up to #27.

Add an optional property to named templates that signals the template to use for x substitution.

What to name it, though?

Processing dates

Intl.DateTimeFormat

Configs:

console.log(new Intl.DateTimeFormat('en-US', { month: 'long', day: 'numeric' }).format(date));
// "December 19"

console.log(new Intl.DateTimeFormat('es', { dateStyle: 'long' }).format(date));
// "19 de diciembre de 2020"

console.log(new Intl.DateTimeFormat('en-GB', { dateStyle: 'long' }).format(date));
// "19 December 2020"

console.log(new Intl.DateTimeFormat('en-US', { dateStyle: 'long' }).format(date));
// "December 19, 2020"

EDTF

EDTF.js is strict about EDTF string parsing; it will error if input is not valid.

So I'll probably have to first confirm input is valid (using isEDTF() or something similar) and only run edtf() if it is, and pass it through if not.

And while toLocaleDateString will work with standard date-times, it won't with intervals. So start with the former, and worry about the rest later.

Example:

[...edtf('2001/2002-08~').values].map(d => format(d, 'en')).join(' until ')
//-> '2001 until ca. 8/2002'

Configuration

I've already added this to the Style options, but am now getting to implementing.

Perhaps the "format" piece could map onto this lookup const?

import edtf, { Date } from "edtf";
import fs from "node:fs";

const d1 = edtf("2016-10");

const date_config = {
    month: "full"
};

// use these, and lookup full vs short in the date options?
const dateFormats = {
    year: { year: "numeric" },
    monthDay: { month: `${date_config.month}`, day: "numeric" },
    full: { month: `${date_config.month}`, day: "numeric" },
}

console.log(d1.toLocaleDateString("en-us", dateFormats.monthDay));
console.log(d1.toLocaleDateString("en-us", dateFormats.full));
console.log(d1.year);

Review/refine style model

Intro

Try deno task docs for documentation.

I estimate the model is roughly 80 percent complete, but I'm increasingly convinced it's a solid foundation.

I'm a TS newbie. While I now better understand the details of how to model with it, this could still use review from people more knowledgeable on the technical details.

Model description and comparison to CSL 1.0

High-level

At a high-level, there are extensible groups of parameters ("options"), and there are templates.

Templates can be inline, or referenced, as in CSL 1.0.

So far, there's nothing in that description that is any different than 1.0, other than names (and that notion of "extensible").

But the first change is named templates can also be contained in external template files, which is a minor change we could apply to 1.0 also.

The more fundamental change is I am putting much more logic in the the parameters, and trying to leave them out of the templates.

Templates

TemplateModel is currently defined as follows, with separate interfaces for different data types.

type TemplateModel = 
  | RenderList  
  | RenderItemTemplate
  | RenderItemSimple 
  | RenderItemContributorList
  | RenderItemLocatorList
  | RenderItemDate
  | RenderItemTitle
  | Cond
  ;

Templates are just flexible lists of objects (RenderItem), lists/arrays (RenderList), and a conditional (Cond).

In CSL 1.0, for example, we have cs:layout, and there can only be one for each cs:citation and cs:bibliography element, and only one of each of those.

This model throws that out, and one can use a top-level Cond structure to support different features, like local citation commands or styles, or multilingual, without changing the basic model (aside from adding a new Condition property).

There remains a single citation and bibliography property in a Style, but they each are much more flexible.

Here's the citation definition, also explicitly defined as a List:

citation?: RenderItemCitationList;

On Extensibility

Effectively how I'm defining the OptionGroup interfaces would be equivalent to allowing foreign attributes in CSL 1.0.

The idea is to allow extension in an area that wouldn't break parsers, so evolution going forward is easier.

Something similar could be done in 1.0, but would certainly be more difficult, both to implement the schema changes, and to update processors and styles.

Model conversion to other languages

One reason to do the modeling in TS is it cleanly converts to JSON Schema, and I'm pretty certain to other languages (Rust, etc.).

Typescript has a third-party library, that I am using here to auto-convert these models to what appears to be compliant and well-defined (if verbose) JSON Schemas.

This tool also appears to do reasonable job at first glance of converting JSON schema (with experimental support for typescript itself) model code in different languages, including Rust, Swift, and Haskell.

There's also this for TS to Lua:

https://typescripttolua.github.io/

Here's the style schema converted to Rust (which does compile without adjustment):

// Example code that deserializes and serializes the model.
// extern crate serde;
// #[macro_use]
// extern crate serde_derive;
// extern crate serde_json;
//
// use generated_module::Style;
//
// fn main() {
//     let json = r#"{"answer": 42}"#;
//     let model: Style = serde_json::from_str(&json).unwrap();
// }

use serde::{Serialize, Deserialize};

/// A CSL Style.
#[derive(Serialize, Deserialize)]
pub struct Style {
    /// The bibliography specification.
    #[serde(rename = "bibliography")]
    bibliography: Option<Bibliography>,

    /// r
    /// The categories the style belongs to; for purposes of indexing.
    #[serde(rename = "categories")]
    categories: Option<Vec<CategoryType>>,

    /// The citation specification.
    #[serde(rename = "citation")]
    citation: Option<Citation>,

    /// The description of the style.
    #[serde(rename = "description")]
    description: Option<String>,

    /// The machine-readable token that uniquely identifies the style.
    #[serde(rename = "id")]
    id: Option<String>,

    /// Global parameter options.
    #[serde(rename = "options")]
    options: Option<OptionGroup>,

    /// The templates for rendering the bibliography and citations.
    #[serde(rename = "templates")]
    templates: Option<Vec<NamedTemplate>>,

    /// The human-readable name of the style.
    #[serde(rename = "title")]
    title: Option<String>,
}

/// The bibliography specification.
#[derive(Serialize, Deserialize)]
pub struct Bibliography {
    #[serde(rename = "bold")]
    bold: Option<bool>,

    /// The string with which to join two or more rendering comnponents.
    #[serde(rename = "delimiter")]
    delimiter: Option<String>,

    #[serde(rename = "emph")]
    emph: Option<bool>,

    /// The rendering instructions; either called template name, or inline instructions.
    #[serde(rename = "format")]
    format: Option<BibliographyFormat>,

    #[serde(rename = "heading")]
    heading: Option<String>,

    #[serde(rename = "listStyle")]
    list_style: Option<String>,

    #[serde(rename = "options")]
    options: Option<OptionGroup>,

    /// The symbol pair to wrap around one or more rendering components.
    /// Interaction with surrounding punctuation is localized.
    #[serde(rename = "wrap")]
    wrap: Option<WrapType>,
}

#[derive(Serialize, Deserialize)]
pub struct Condition {
    /// When a match, process these templates.
    #[serde(rename = "format")]
    format: Vec<TemplateModel>,

    /// Is the item variable a number?
    #[serde(rename = "isNumber")]
    is_number: Option<LocatorType>,

    /// A list of reference item types; if one is true, then return true.
    #[serde(rename = "match")]
    condition_match: Option<MatchType>,

    /// Does the date conform to EDTF?
    #[serde(rename = "isEDTFDate")]
    is_edtf_date: Option<DateType>,

    /// Is the item reference type among the listed reference types?
    #[serde(rename = "isRefType")]
    is_ref_type: Option<Vec<RefType>>,

    /// Does the item reference include one of the listed variables?
    #[serde(rename = "hasVariable")]
    has_variable: Option<Vec<VariableType>>,

    /// The item reference locale; to allow multilingual output.
    #[serde(rename = "locale")]
    locale: Option<String>,
}

/// A template that is defined inline.
///
/// Integral citations are those where the author is printed inline in the text; aka "in
/// text" or "narrative" citations.
///
/// Non-integral citations are those where the author is incorporated in the citation, and
/// not printed inline in the text.
#[derive(Serialize, Deserialize)]
pub struct TemplateModel {
    #[serde(rename = "bold")]
    bold: Option<bool>,

    /// The string with which to join two or more rendering comnponents.
    #[serde(rename = "delimiter")]
    delimiter: Option<String>,

    #[serde(rename = "emph")]
    emph: Option<bool>,

    /// The rendering instructions; either called template name, or inline instructions.
    #[serde(rename = "format")]
    format: Option<TemplateModelFormat>,

    #[serde(rename = "options")]
    options: Option<OptionGroup>,

    /// The symbol pair to wrap around one or more rendering components.
    /// Interaction with surrounding punctuation is localized.
    #[serde(rename = "wrap")]
    wrap: Option<WrapType>,

    /// The template name to use for partial formatting.
    #[serde(rename = "template")]
    template: Option<String>,

    #[serde(rename = "variable")]
    variable: Option<Type>,

    /// When all of the when conditions are nil, format the children.
    #[serde(rename = "else")]
    template_model_else: Option<Vec<TemplateModel>>,

    /// For the first condition that is non-nil, format the children.
    #[serde(rename = "when")]
    when: Option<Vec<Condition>>,
}

/// Parameter groups.
///
/// Global parameter options.
#[derive(Serialize, Deserialize)]
pub struct OptionGroup {
    /// Date formatting configuration.
    #[serde(rename = "dateFormatting")]
    date_formatting: Option<DateFormatting>,

    /// Disambiguation configuration of rendererd group display names.
    #[serde(rename = "disambiguate")]
    disambiguate: Option<Disambiguation>,

    /// Grouping configuration.
    #[serde(rename = "group")]
    group: Option<Vec<GroupSortType>>,

    /// Localization configuration.
    #[serde(rename = "localization")]
    localization: Option<Localization>,

    /// Sorting configuration.
    #[serde(rename = "sort")]
    sort: Option<Vec<Sort>>,

    /// Substitution configuration.
    #[serde(rename = "substitute")]
    substitute: Option<Substitution>,
}

/// Date formatting configuration.
#[derive(Serialize, Deserialize)]
pub struct DateFormatting {
    #[serde(rename = "date")]
    date: Option<EStyle>,

    #[serde(rename = "month")]
    month: Option<MonthStyle>,

    #[serde(rename = "time")]
    time: Option<EStyle>,

    #[serde(rename = "year")]
    year: Option<YearStyle>,
}

/// Disambiguation configuration of rendererd group display names.
///
/// Disambiguation of rendered group display name configuration.
#[derive(Serialize, Deserialize)]
pub struct Disambiguation {
    #[serde(rename = "addNames")]
    add_names: Option<AddNames>,

    #[serde(rename = "addYearSuffix")]
    add_year_suffix: Option<bool>,
}

/// Localization configuration.
///
/// Terms and data localization configuration.
#[derive(Serialize, Deserialize)]
pub struct Localization {
    /// The scope to use for localization.
    ///
    /// "per-item" uses the locale of the reference item, and "global" uses the target language
    /// across all references.
    #[serde(rename = "scope")]
    scope: Option<Scope>,
}

/// Reference sorting configuration.
#[derive(Serialize, Deserialize)]
pub struct Sort {
    #[serde(rename = "key")]
    key: GroupSortType,

    #[serde(rename = "order")]
    order: Order,
}

/// Substitution configuration.
///
/// Substitution of variable configuration.
#[derive(Serialize, Deserialize)]
pub struct Substitution {
    /// When author is nil, substitute the first non-nil listed variable.
    /// Once a substitution is made, the substituted variable shall be set to nil for purposes of
    /// later rendering.
    #[serde(rename = "author")]
    author: Vec<SubstitutionType>,
}

/// The citation specification.
#[derive(Serialize, Deserialize)]
pub struct Citation {
    #[serde(rename = "bold")]
    bold: Option<bool>,

    /// The string with which to join two or more rendering comnponents.
    #[serde(rename = "delimiter")]
    delimiter: Option<String>,

    #[serde(rename = "emph")]
    emph: Option<bool>,

    /// The rendering instructions; either called template name, or inline instructions.
    #[serde(rename = "format")]
    format: Option<BibliographyFormat>,

    /// Integral citations are those where the author is printed inline in the text; aka "in
    /// text" or "narrative" citations.
    #[serde(rename = "integral")]
    integral: Option<RenderList>,

    /// Non-integral citations are those where the author is incorporated in the citation, and
    /// not printed inline in the text.
    #[serde(rename = "nonIntegral")]
    non_integral: Option<RenderList>,

    #[serde(rename = "options")]
    options: Option<OptionGroup>,

    #[serde(rename = "placement")]
    placement: Option<Placement>,

    /// The symbol pair to wrap around one or more rendering components.
    /// Interaction with surrounding punctuation is localized.
    #[serde(rename = "wrap")]
    wrap: Option<WrapType>,
}

/// Integral citations are those where the author is printed inline in the text; aka "in
/// text" or "narrative" citations.
///
/// Non-integral citations are those where the author is incorporated in the citation, and
/// not printed inline in the text.
#[derive(Serialize, Deserialize)]
pub struct RenderList {
    #[serde(rename = "bold")]
    bold: Option<bool>,

    /// The string with which to join two or more rendering comnponents.
    #[serde(rename = "delimiter")]
    delimiter: Option<String>,

    #[serde(rename = "emph")]
    emph: Option<bool>,

    /// The rendering instructions; either called template name, or inline instructions.
    #[serde(rename = "format")]
    format: Option<BibliographyFormat>,

    #[serde(rename = "options")]
    options: Option<OptionGroup>,

    /// The symbol pair to wrap around one or more rendering components.
    /// Interaction with surrounding punctuation is localized.
    #[serde(rename = "wrap")]
    wrap: Option<WrapType>,
}

#[derive(Serialize, Deserialize)]
pub struct NamedTemplate {
    /// The name token for the template, for reference from other templates.
    #[serde(rename = "name")]
    name: String,

    #[serde(rename = "options")]
    options: Option<OptionGroup>,

    #[serde(rename = "template")]
    template: Vec<TemplateModel>,
}

#[derive(Serialize, Deserialize)]
#[serde(untagged)]
pub enum BibliographyFormat {
    String(String),

    TemplateModelArray(Vec<TemplateModel>),
}

#[derive(Serialize, Deserialize)]
#[serde(untagged)]
pub enum TemplateModelFormat {
    String(String),

    TemplateModelArray(Vec<TemplateModel>),
}

/// A list of reference item types; if one is true, then return true.
#[derive(Serialize, Deserialize)]
pub enum MatchType {
    #[serde(rename = "all")]
    All,

    #[serde(rename = "any")]
    Any,

    #[serde(rename = "none")]
    None,
}

#[derive(Serialize, Deserialize)]
pub enum VariableType {
    #[serde(rename = "article")]
    Article,

    #[serde(rename = "author")]
    Author,

    #[serde(rename = "book")]
    Book,

    #[serde(rename = "chapter")]
    Chapter,

    #[serde(rename = "container-title")]
    ContainerTitle,

    #[serde(rename = "editor")]
    Editor,

    #[serde(rename = "issue")]
    Issue,

    #[serde(rename = "issued")]
    Issued,

    #[serde(rename = "pages")]
    Pages,

    #[serde(rename = "publisher")]
    Publisher,

    #[serde(rename = "title")]
    Title,

    #[serde(rename = "volume")]
    Volume,
}

/// Does the date conform to EDTF?
#[derive(Serialize, Deserialize)]
pub enum DateType {
    #[serde(rename = "issued")]
    Issued,
}

/// Is the item variable a number?
#[derive(Serialize, Deserialize)]
pub enum LocatorType {
    #[serde(rename = "chapter")]
    Chapter,

    #[serde(rename = "page")]
    Page,
}

#[derive(Serialize, Deserialize)]
pub enum RefType {
    #[serde(rename = "article")]
    Article,

    #[serde(rename = "book")]
    Book,

    #[serde(rename = "chapter")]
    Chapter,
}

#[derive(Serialize, Deserialize)]
pub enum EStyle {
    #[serde(rename = "full")]
    Full,

    #[serde(rename = "long")]
    Long,

    #[serde(rename = "medium")]
    Medium,

    #[serde(rename = "short")]
    Short,
}

#[derive(Serialize, Deserialize)]
pub enum MonthStyle {
    #[serde(rename = "long")]
    Long,

    #[serde(rename = "narrow")]
    Narrow,

    #[serde(rename = "numeric")]
    Numeric,

    #[serde(rename = "short")]
    Short,

    #[serde(rename = "2-digit")]
    The2Digit,
}

#[derive(Serialize, Deserialize)]
pub enum YearStyle {
    #[serde(rename = "numeric")]
    Numeric,

    #[serde(rename = "2-digit")]
    The2Digit,
}

#[derive(Serialize, Deserialize)]
pub enum AddNames {
    #[serde(rename = "all")]
    All,

    #[serde(rename = "all-with-initials")]
    AllWithInitials,

    #[serde(rename = "by-cite")]
    ByCite,

    #[serde(rename = "primary")]
    Primary,

    #[serde(rename = "primary-with-initials")]
    PrimaryWithInitials,
}

#[derive(Serialize, Deserialize)]
pub enum GroupSortType {
    #[serde(rename = "as-cited")]
    AsCited,

    #[serde(rename = "author")]
    Author,

    #[serde(rename = "title")]
    Title,

    #[serde(rename = "year")]
    Year,
}

/// The scope to use for localization.
///
/// "per-item" uses the locale of the reference item, and "global" uses the target language
/// across all references.
#[derive(Serialize, Deserialize)]
pub enum Scope {
    #[serde(rename = "global")]
    Global,

    #[serde(rename = "per-item")]
    PerItem,
}

#[derive(Serialize, Deserialize)]
pub enum Order {
    #[serde(rename = "ascending")]
    Ascending,

    #[serde(rename = "descending")]
    Descending,
}

#[derive(Serialize, Deserialize)]
pub enum SubstitutionType {
    #[serde(rename = "editor")]
    Editor,

    #[serde(rename = "title")]
    Title,

    #[serde(rename = "translator")]
    Translator,
}

/// Is the item variable a number?
///
/// Does the date conform to EDTF?
#[derive(Serialize, Deserialize)]
pub enum Type {
    #[serde(rename = "author")]
    Author,

    #[serde(rename = "chapter")]
    Chapter,

    #[serde(rename = "container-title")]
    ContainerTitle,

    #[serde(rename = "editor")]
    Editor,

    #[serde(rename = "issue")]
    Issue,

    #[serde(rename = "issued")]
    Issued,

    #[serde(rename = "page")]
    Page,

    #[serde(rename = "pages")]
    Pages,

    #[serde(rename = "publisher")]
    Publisher,

    #[serde(rename = "title")]
    Title,

    #[serde(rename = "volume")]
    Volume,
}

/// The symbol pair to wrap around one or more rendering components.
/// Interaction with surrounding punctuation is localized.
#[derive(Serialize, Deserialize)]
pub enum WrapType {
    #[serde(rename = "brackets")]
    Brackets,

    #[serde(rename = "parentheses")]
    Parentheses,

    #[serde(rename = "quotes")]
    Quotes,
}

#[derive(Serialize, Deserialize)]
pub enum CategoryType {
    #[serde(rename = "biology")]
    Biology,

    #[serde(rename = "science")]
    Science,

    #[serde(rename = "social science")]
    SocialScience,
}

#[derive(Serialize, Deserialize)]
pub enum Placement {
    #[serde(rename = "inline")]
    Inline,

    #[serde(rename = "note")]
    Note,
}

I don't know Rust myself, but I'm thinking the type/intertface model here is likely to translate pretty easily to its (or Swift's) structs, Haskell types, etc.

So am thinking this could be a useful reference implementation of a major breaking change, if it's warranted, while also confirming the reasonableness of any style format/model changes.

Editor support for schemas

You can also play with the schemas if you like; here is VSCode, with schema-backed validation and auto-complete.

Disambiguation clarification

If one is grouping citations and/or bibliography by author, I think the right way to do that programmatically is to group on normalized full name representations rather than family names; like say:

doe-j:smith-s:jones-k

The behavior and the displayed group name therefore diverge, and we use family name as shorthand in display, which can create conflicts.

But the grouping behavior would be the same regardless of display details.

So maybe here when we say "disambiguation" we really mean of group display names?

I have clarified this in the jsdocs.

https://github.com/bdarcus/csl-next.js/blob/2360517a2d853571205ba23286951acb992b2f78/src/style.ts#L197-L200

It may suggest, however, moving this back the GroupOption (though I prefer the simplicity of the simpler modeling)?

Add basic proof-of-concept formatting

We need basic citation and bibliography formatting, without worrying about nuances like disambiguation, to assess whether this is likely to work.

If I'm right, it should be relatively easy, and generate reasonable results.

Running deno task cli shows the current state. In the end, I'm thinking that should be a CLI that works similar to the haskell-based citeproc: can take input of citations, references, and style, in JSON or YAML, and either:

Export JSON or other results
Operate as a JSON server to do the same.

For now, I have in mind rough priorities being in order:

#18
template rendering (this part isn't done in the model, but that's OK)

I have some placeholder functions for the first 3 and 4, though they don't work. I think the parameter specs are right though.

Some ideas on template formatting here: #114 (comment).

I also have Contributor classes with some basic methods that work.

> const p1 = new Person("Doe", "Jane");
> p1.getSortName();
"Jane, Doe"

Something's wrong with the JSON schema output

Switch to `ts-json-schema-generator` for schema generation

Seems typescript-json-schema is in maintenance mode, and the maintainer recommends this:

https://github.com/vega/ts-json-schema-generator

#83 may also be a possibility, though I doubt it.

Add term and text/str to `TemplateModel`

At first thought, I think something like this is the right approach:

- term: available
  emph: true
- text: foo bar

Looking at the style repo, it seems this is overwhelming used for localized prefixes, in particular, for content.

That also requires using cs:group and such.

So maybe instead could do:

- template: container-apa
  prefixTerm: in

Or maybe:

- template: container-apa
  prefix:
    term: in
    emph: true

Hook it up to `djot.js`

For rich subfield formatting, and also to format djot documents:

https://github.com/jgm/djot.js

This should also be easy, and fun.

import * as djot from "https://esm.sh/@djot/[email protected]";

djot.renderHTML(djot.parse('_Title_ within a "title"'));

My question is mainly strategy (see #30); do we:

convert this AST to the Djot AST, and then let it do the full rendering?
convert this AST to djot markup.
instead inject the djot rendered content into this AST?

I guess the answer may depend on where and how it's integrated, but my impulse is to try 2 first.

Write a processor CLI

Along with #2, the main entry-point should be extended to mirror the haskell citeproc one, but make sure it also aligns with #5.

https://www.makeuseof.com/nodejs-cli-packages-build-tools-best/

Deno-specific:

https://cliffy.io/

Or perhaps this should just be a library, and the CLI a separate project?

Regardless of those details, need a CLI.

Fine-tune schema generation

Premature, but at some point, if this goes anywhere, we'll need to include id properties on the schemas, set some definitions to be extensible, etc.

https://github.com/YousefED/typescript-json-schema#command-line

Also see API docs, for additional annotations, including formats and examples.

https://github.com/YousefED/typescript-json-schema/blob/master/api.md#annotation-tjs

Also, compare to:

https://github.com/vega/ts-json-schema-generator

Both create pretty verbose schemas, which I haven't figure out how to reign in.

Add functions

https://www.npmjs.com/package/group-array

Consistently use either format or template

I lean towards template, though that does raise the singular (call) vs plural (inline) issue that needs more thought. Maybe in that case format is more generic?

See #36

References: object vs array?

Just occurred to me, so noting ...

A citekey is a unique identifier for a reference. In the current model, ID is a required property on Reference, which is collected in Bibliography under references, which is an array.

But an object or dictionary better matches the actual data:

references:
  doe1:
    title: Some Title

It's also more performant for lookup; this article, for example, concludes:

... working objects is comparatively much faster than the arrays when we don’t need orders.

Perhaps the input model should be the former, and a processor should simply transform it to a list for intermediate processing?

Aside: my initial implementation of what became CSL was XSLT 1.0, which is a purely functional language. So the processing there was all about transforming lists into other lists.

Rely more on parameters, less on conditional?

The TemplateModel and Cond structure is now pretty flexible, but potentially a bit confusing.

Here's what I think is a mostly correct way to handle configuring default and narrative citations.

citation:
  format:
    - when:
        - mode: narrative
          format:
            - groupBy: cs-author
              template: author-apa
              delimiter: ", "
              andAs: symbol
              format:
                - groupBy: cs-year
                  prefix: (
                  suffix: )
                  format:
                    - variable: issued
      else:
        - groupBy: cs-author-year # not sure this grouping is right
          prefix: (
          suffix: )
          delimiter: "; "
          format:
            - template: author-apa
            - variable: issued

Maybe we could define two kinds of conditions somehow, so maybe could do something like the below?

Only makes sense if we can clearly distinguish locale and mode from the other kinds of conditions? Maybe, in effect, both are kinds of modes?

---
citation:
  - modes:
      locale: es
      citation: narrative
    groupBy: cs-author
    delimiter: ", "
    andAs: symbol
    format:
      - template: author-apa
        suffix: " "
        format:
          - groupBy: cs-year
            prefix: (
            suffix: )
            variable: issued

In that model, there would be modes and conditions.

Does that distinction make sense though?

But then ...

if that makes sense, maybe can generalize that also?

citation:
  - modes:
      locale: es
      citation: narrative
  - predicates:
      isDate: issued
      isRefType: ["book"]
      match: any
    format:
      ...

The advantage is it's simpler syntax.

The disadvantage is it's less flexible.

Except, maybe if it was allowed in named templates also, that disadvantage goes away?

template:
  name: title-abc
  predicates:
    locale: es
    citation: narrative
    isDate: issued
    isRefType:
      - book
    match: all
  format: ...

OTOH, not sure how much it buys, or at what costs.

Align formatting (bold, emph, etc.) with djot AST

@jgm, when you get a chance, curious what you suggest on integrating djot.js here, for both subfield formatting, and ultimately to integrate it into djot document processing.

Edit: thinking about this more, may need to both use djot.parse on the fields and work with the AST otherwise? Basically, I'm thinking of adding a toDjotAST method.

But if I need to transform content to the AST, what's the best way to do that? Any suggestions on where to look in djot.js for inspiration?

Note to self: the djot playground includes filters!

Also: https://mfrachet.github.io/create-frontend-framework/templating/template-literals.html#abstracting-for-others-html-elements

As part of the more general #5.

Templates (aka macros in 1.0) look like this:

- variable: author
  emph: true
  wrap: parentheses

[
  {
    "variable": "author",
    "emph": true,
    "wrap": "parentheses"
  }
]

Basically, better to convert this to:

The Djot AST, or ...
The Djot string markup, and let djot.parse() handle the rest?

In 1, then, I'd run djot.parse() first, and in 2 last.

I know I can just create the AST from that, and then insert subfield content as children.

But one issue might be that this CSL model here is flat, while Djot is pretty hierarchical.

So with this, the emph is contained in that inner child object.

> djot.parse("hello *_there_*").children[0];
{
  tag: "para",
  children: [
    { tag: "str", text: "hello ", pos: undefined },
    {
      tag: "strong",
      children: [ { tag: "emph", children: [Array], pos: undefined } ],
      pos: undefined
    }
  ],
  attributes: undefined,
  pos: undefined
}

Contributor modeling

I need to figure this out soon, as processing functionality builds on it.

Right now, I have Contributor as an abstract class, which Person and Organization "extend".

This is the right approach, I think, from a modeling POV, but the deserialization code, that uses class-transformer, currently introduces this one little limitation that bothers me: it requires a type property on the contributor:

- type: organization
  name: United Nations

I am trying to figure out a way around that, or at least to define a default when not present, but if that doesn't work, the alternative is a single contributor class.

class Contributor {
  name?: string;
  familyName?: string;
  givenName?: string;
  location?: string;
 ...

Complete example here, that works, but I'm not particularly fond of it.

https://gist.github.com/bdarcus/59d6d90783f29511a6551a19b7fca7bb

Or could ditch the class, and use an interface with functions.

Rather than require a locale file as input, allow a locale?

E.g.:

const citeProc = Processor(data, style, "en-US");

On a related note, will need some way to indicate extra template files to load.

Maybe, templateCollections property that takes an array of string tokens?

Would still need:

a way to name the collections
ability to include arbitrary files

Change `variable` to `date` etc

Split into multiple properties by input data type.

Cleanup package stuff

Merged #52, and then quickly realized a couple of little issues:

still have typescript-eslint, and need to check if rome really replaces it
edtf is a dependency; not a dev-dependency

Make `Condition` an `interface`; and on extensibility

So how to do extension?

In CSL 1.0, we didn't, because were unsure of the implications.

If we wanted to in XML, we would namespace all attributes, and allow foreign-namespaced attributes in certain places.

<if foo:bar="true" cs:variable="book">

That would be easy in some ways (probably take 30-60 minutes of schema work), but a major breaking change in others (code would need updating; also, all styles, though that wouldn't be hard).

In typescript, we export an interface, which allows us, or others, to add new properties, and then in the generated JSON Schema allow other properties in those "certain places" (like Condition).

Caveat: currently Condition is a type, which can't be extended in this way, so need to figure that out.

My read of the docs says it's not possible, and would somehow need to refactor Condition to use interfaces instead.

But maybe this is saying it is possible with type aliases?

type Foobar = 'FOO' | 'BAR';
type FoobarBaz = Foobar | 'BAZ'; // or: 'BAZ' | Foobar

Or here:

type PartialPointX = { x: number; };
type Point = PartialPointX & { y: number; };

In earlier iterations of my thinking, I was imagining templates which could take arbitrary lists of piped function names.

If this is still promising:

const sum = (n: number) => n + 10;
const double = (n: number) => n * 2;
const divide = (n: number) => n / 2;

const combine = (result, nextFun) => nextFun(result);
const pipe = (...fns) => x => fns.reduce(combine, x);

const result = pipe(sum, double, divide)(10);

https://www.thisdot.co/blog/functional-programming-in-typescript-using-the-fp-ts-library-pipe-and-flow

https://www.telerik.com/blogs/functional-programming-typescript

https://dev.to/ecyrbe/how-to-use-advanced-typescript-to-define-a-pipe-function-381h

https://dev.to/nexxeln/implementing-the-pipe-operator-in-typescript-30ip

Setup test suite

... perhaps ideally one inspired by the 1.0 one, at least ultimately, if there's a need?

Or maybe could write initially in something like jest or vitetest, but later convert it something language-agnostic?

Seems the primary downside for jest ATM is "experimental" support for ES modules, while vitetest is based on ESM and esbuild.

EDIT: node has it's own test runner now, marked stable in v20. Currently needs some tweaks to work with typescript, but I expect those to go away a they improve it.

https://github.com/scottwillmoore/node-test-with-typescript

If jest:

https://github.com/aelbore/esbuild-jest

Deserializing JSON styles and references to TS model?

#59 mostly addresses this; aside from use of the @Type decorator to address nested Contributor arrays.

Make citation locators an array

For this purpose, need to go back to the CSL 1.1 approach.

Not sure the best approach, as that's maybe heavier-weight than needed? I do think it's clean though from a modelling POV.

Add `DateOptionGroup`

My current plan, carried over from discussions for the v1.1 branch of CSL, standardizes on an EDTF string for input, and I'm hoping to convert to an EDTF Date object when deserializing in #46 (though this may be too ambitious, given there are many different types of objects in EDTF).

But formatting?

I'd prefer to avoid reintroducing the 1.0 date/date-part structures if possible, since it's another piece of template complexity.

The JS toLocaleDateString method looks like a pretty awesome approach.

> const dateConfig = { weekday: "long", year: "numeric", month: "short", day: "numeric"}
> d1.toLocaleDateString('en-us', dateConfig)
'Thursday, Apr 27, 2023'
> d1.toLocaleDateString('es', dateConfig)
'jueves, 27 de abr de 2023'
> d1.toLocaleDateString('de', dateConfig)
'Donnerstag, 27. Apr. 2023'
> d1.toLocaleDateString('ja', dateConfig)
'2023年4月27日木曜日'

const d1 = edtf("2016-10");

const d_month_day = { month: "short", day: "numeric" };
const d_full = { year: "numeric", month: "long", day: "numeric" };

console.log(d1.toLocaleDateString("en-us", d_month_day));
console.log(d1.toLocaleDateString("es", d_full));
console.log(d1.year);

Of note:

it has some conceptual similarity to how we localize terms in 1.0.
it's configured in an object (order doesn't matter there)
details like order and punctuation are localized (though I don't know how or where)

See also the newer Intl.DateTimeFormat, which is similar:

> (new Intl.DateTimeFormat('en-GB', { dateStyle: 'full', timeStyle: 'long', timeZone: 'Australia/Sydney' }).format(d1));
'Thursday, 27 April 2023 at 22:04:49 GMT+10'

But I think we may be able to get away with adding Dates as an OptionGroup?

I think I'll try to go with this first, as it keeps things simple, but promises to be international-friendly.

If it didn't work, would need ability to localize within styles, with approaches like this:

date.format(now, 'ddd, MMM DD YYYY');

Add `options` to `citation` and `bibliography`

Also, I think options.group.key should be plural.

Refine `ProcReference`

This, and maybe set some default values.

constructor(...args) {
  super(...args);
}

See:

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Classes/constructor

Sort too limited

Probably need to revert back to the CSL 1.0 design for this:

sort:
  - key: author
    order: ascending
  - key: year
    order: descending

FIx Sort modelling

This is wrong:

https://github.com/bdarcus/csl-next.js/blob/21ad0716d18eabbba7d76c6c371be6d0cdaa0948/src/style.ts#L32

With the last option, it should just be:

sort: as-cited

There's no need there for the more complicated modeling.

Typescript modeling questions

In RNC, I can do:

foo = ( one | two | three )+, four

That means, at least one of the first three, and a required "four".

I need to do something similar in places, like Condition, where I need to require a match property, and at least one of the other properties.

How do I model that?

Fixed in a78dea1, with this.

Complete `CSLDate`

This is a placeholder ATM that reflects the intention to define CSLDate as either a literal string, or EDTF date.

It would use third-party libraries for parsing and schema validation.

I suspect this will be fairly easy, with some interesting design choices for dealing with formatting of complex EDTF dates.

EDIT: The jsdoc format incantation to get EDTF validation into the schema is:

/**
 * @format edtf/level-1+season-intervals
 */

Just need to also allow fallback to literal string. Or maybe that's not possible, and better done elsewhere?

Substitutions as parameters?

I merged the simple solution below in the linked commit. Will need to think about the formatting question, however.

It may be enough to have the values in the array be template names?

Again, though we'd want to confirm this with the existing styles, it seems there are a small number of these, with by far the most important being for missing authors.

Perhaps we might do something like this globally?

substitutions:
  - author:
      - editor
      - translator
      - title

Or even:

authorSubstitutions:
  - editor
  - translator
  - title

I do realize there are some nuances here (like the above very common pattern often needs a macro for title formatting), but perhaps we can resolve that, and avoid adding back another piece of template complexity?

PS - this command will match across the styles directory, and show the preceding five lines. Not surprisingly, the above logic is extremely common, though I don't know of a convenient way to write a script to quickly quantify that.

grep -B 6  "</substitute" ../csl/styles/*.csl

Confirm delimiter is available in the right places

Something seems a bit off ATM.

Need to be able to do stuff like this.

This currently is valid:

      - when:
          - hasVariable:
              - issued
              # - accessed
            match: any
            format:
              - delimiter: " "
                format:
                  - template: one
                  - template: two

... but this is not (and so is a bug):

       else:
         format:
           - delimiter: ", "
             format:
               - template: one
               - template: two

EDIT: I think that outer format isn't allowed there, because unnecessary.

But we shouldn't require duplication to change a delimiter.

I think, however, we need to allow and encourage the below (and maybe somehow also move that to a parameter)?

- delimiter: " "
  format: one

cc @adam3smith - here's an answer to one of your questions, though there's an inconsistency I need to fix.

Fill out models

If we get though #7, next step would be models for the other key components; notably citations, references, locators, and terms.

On the latter, my impression is the current CSL term model is sound, so it would just need translating into typescript.

Chore: "examples" commit type

Use chore instead.

Make grouping parameter available globally

Not really sure why it should be restricted to the citation or bibliography contexts?

Now I'm wondering if it all needs a bit of restructuring; say:

title: Foo
options:
  grouping:
templates:
citation:
  options:
  contexts:
    options:
  templates:

So consistent distinction between options and templates, available at all levels, where lower-levels override higher levels?

Or maybe the current distinction between templates (available only globally, and in external files) and formats (available only at lower levels), makes sense?

Or to go back to the "flatter" idea, maybe options is only available at the top-level, and lower-level parameters are also configured there? It's easier to program, I'd think.

Confirm deno compatibility

Both deno and bun are much faster on the simple main.ts app than node, so seems sensible to try to retain compatibility across them.

Minimal

With #91, I took a step toward deno compatibility, which may be sufficient for now.

I think this works as expected, but need to test a bit more:

> import { Processor } from "https://cdn.jsdelivr.net/gh/bdarcus/csl-next.js/src/processor.ts";
undefined
> const citeProc = new Processor();
undefined
> citeProc
Processor { style: undefined, bibliography: undefined }

See:

https://github.com/denoland/fresh

https://github.com/denoland/dnt

May want to add an import map, and support for it in esbuild?

https://github.com/trygve-lie/esbuild-plugin-import-map

Maximal

Migrate to a deno-first approach entirely.

https://github.com/bdarcus/csln-deno-test

Here the typescript is targeting deno, but I use dnt to transform it into a npm module.

The advantage of this setup:

It's an all-in-one tool; not only an engine, but a linter, formatter, AND test runner, etc. So while I simplified here, this would go farther.
They're all really fast.
It runs typescript directly.
It has experimental integrated KV storage (local SQLite and distributed FoundationDB, that would be perfect for storing the InputBibliograhy data, and perhaps other things (styles, templates, terms, abbreviations), since the values are just JSON objects.

ESM design

Since I've switched to Deno, the approach probably changes a bit.

These projects may present some ideas:

https://github.com/lumeland/lume

This article suggests:

For new packages, don’t publish index or library modules. Publish a module for each package export that can be deep imported. It’s perfectly valid to not have a main field in the package.json.

For packages with index or library modules, remove them in a SemVer major release. To help users migrate, in the changelog entry and GitHub release markdown use diff code blocks to show exactly how each possible import in a project should be updated.

Here's a project by the author that implements the ideas: https://github.com/jaydenseric/test-director.

https://stackoverflow.com/a/43951115/13860420

https://www.typescriptlang.org/docs/handbook/modules.html#default-exports

This is related to a big question about the processor model; if I want to continue with the current Class-based approach.

If I don't, that may suggest reorganizing something like the below, where each exports a default function:

src/
  processor/
    sort.ts
    group.ts
    date.ts
    title.ts
    locators.ts
    contributors.ts