bdarcus / csl-next Goto Github PK
View Code? Open in Web Editor NEWAn experimental reimagining of CSL
License: Mozilla Public License 2.0
An experimental reimagining of CSL
License: Mozilla Public License 2.0
If dates can work via parameters, perhaps contributors can also, at least as default behavior?
This is a newer feature in JS for more general list formatting.
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl/ListFormat
const vehicles = ['Motorcycle', 'Bus', 'Car'];
const formatter = new Intl.ListFormat('en', { style: 'long', type: 'conjunction' });
console.log(formatter.format(vehicles));
// Expected output: "Motorcycle, Bus, and Car"
For us, type
would always be "conjunction", and so just a Boolean.
Contributors is just a generic list, then, so can use generic parameters for that, plus something for personal names, and labels.
contributors:
andAs: symbol
nameAsSort: first
labelWrap: parentheses
So I actually think we're covered here; the JS code just might be useful for implementation.
Style is:
See also:
I'm finding the node/js ecosystem overwhelming, but am thinking I may want to standardize on newer features (like ES modules) and build packages (esbuild
), since this is new, and experimental.
This starter kit might be helpful for that; it sets up a build system based on esbuild
, adds github CI, linting, etc., along with a working app with example src tree organization.
https://github.com/FreekMencke/node-ts-starter-cli
Maybe use rome
instead of prettier
and eslint
though?
❯ npx node-ts-starter-cli create csl-next-ts -g -l -p
So it should be easy to adapt the current source structure to this, perhaps keeping a version of the basic app that can improve as the code gets implemented. E.g.:
Maybe along with this (or maybe bun
?) for running ts
directly?
https://www.npmjs.com/package/tsx
Here's the src tree it creates:
❯ tree src
src
├── app
│ ├── app.ts
│ └── common
│ └── logger.ts
├── config
│ └── config.ts
├── main.ts
└── typings
└── typings.d.ts
{
"name": "csl-next.js",
"version": "0.0.1",
"license": "MIT",
"scripts": {
"build": "node ./build/esbuild.js --dev",
"build:meta": "node ./build/esbuild.js --dev --meta",
"build:meta:prod": "node ./build/esbuild.js --meta",
"build:prod": "node ./build/esbuild.js",
"lint": "rome check . .ts,.js",
"format": "rome format --write .",
"format:ci": "rome ci .",
"start": "node ./build/esbuild.js --dev --watch --run",
"start:ci": "node ./build/esbuild.js --run",
"start:prod": "node ./build/esbuild.js --watch --run"
},
"prettier": "./.prettierrc.json",
"devDependencies": {
"@es-exec/esbuild-plugin-start": "^0.0.4",
"@types/node": "^18.14.1",
"@typescript-eslint/eslint-plugin": "^5.53.0",
"@typescript-eslint/parser": "^5.53.0",
"edtf": "^4.4.1",
"esbuild": "^0.17.10",
"minimist": "^1.2.8",
"rome": "^12.0.0",
"typescript": "^4.9.5"
}
}
Premature to worry about ATM, but the two bigs ones recently are deno
and bun
.
Both support typescript
out-of-box, and both prioritize performance.
The former is different enough, however, that I don't see it likely worth worrying about any compatibility, at least not for awhile.
But bun
aims to be a drop-in replacement for node. And it is extremely fast.
As for compatibility, this currently errors:
❯ bun run ./build/esbuild.js --dev --watch --run
✘ [ERROR] Expected value for define "VERSION" to be a string, got undefined instead
But this runs fine.
❯ bun run src/main.ts
It's premature for the processor prototype, since it doesn't do anything, but docs can provide a useful view of the models. For that reason, should integrate a make option that generates docs for style, citation, bibliography.
I find typedoc pretty nice; example usage here:
typedoc src/style.ts src/citation.ts src/bibliography
Here's a screenshot of it's output.
See config options.
There are extensions too, like this one:
https://www.npmjs.com/package/typedoc-umlclass
Also, not sure what to make of this:
❯ typedoc src/style.ts
[warning] AffixType, defined in ./src/style.ts, is referenced by Group.affixes but not included in the documentation.
[warning] TemplateModel, defined in ./src/style.ts, is referenced by NamedTemplate.template but not included in the documentation.
[warning] SubstitutionType, defined in ./src/style.ts, is referenced by OptionGroup.substitute but not included in the documentation.
[warning] GroupAffixLevel, defined in ./src/style.ts, is referenced by RefList.groupAffixLevel but not included in the documentation.
[warning] Bibliography, defined in ./src/style.ts, is referenced by Style.bibliography but not included in the documentation.
[warning] CategoryType, defined in ./src/style.ts, is referenced by Style.categories but not included in the documentation.
[warning] Citation, defined in ./src/style.ts, is referenced by Style.citation but not included in the documentation.
This description of how the org-cite CSL processor parses strings to extract lists of locators is excellent.
Seems to have derived from citeproc-org.
We should add it somewhere to CSL, if we haven't already.
;; CSL styles recognize "locator" in citation references' suffix. For example,
;; in the citation
;;
;; [cite:see @Tarski-1965 chapter 1, for an example]
;;
;; "chapter 1" is the locator. The whole citation is rendered as
;;
;; (see Tarski 1965, chap. 1 for an example)
;;
;; in the default CSL style.
;;
;; The locator starts with a locator term, among "bk.", "bks.", "book", "chap.",
;; "chaps.", "chapter", "col.", "cols.", "column", "figure", "fig.", "figs.",
;; "folio", "fol.", "fols.", "number", "no.", "nos.", "line", "l.", "ll.",
;; "note", "n.", "nn.", "opus", "op.", "opp.", "page", "p.", "pp.", "paragraph",
;; "para.", "paras.", "¶", "¶¶", "§", "§§", "part", "pt.", "pts.", "section",
;; "sec.", "secs.", "sub verbo", "s.v.", "s.vv.", "verse", "v.", "vv.",
;; "volume", "vol.", and "vols.". It ends with the last comma or digit in the
;; suffix, whichever comes last, or runs till the end of the suffix.
;;
;; The part of the suffix before the locator is appended to reference's prefix.
;; If no locator term is used, but a number is present, then "page" is assumed.
Generally, I've settled on:
InputReference
)Also, set default values for all parameter options, and maybe interface fields.
https://www.typescripttutorial.net/typescript-tutorial/typescript-default-parameters/
https://bobbyhadz.com/blog/typescript-interface-default-values
https://timmousk.com/blog/typescript-interface-default-value/
This is currently inconsistently implemented.
I added AJV, intending to use it for validation of the input.
But it looks like it may have much more:
https://ajv.js.org/guide/typescript.html
See also ;
https://github.com/grantila/suretype
Probably also important to integrate in the CI, for the examples, once we also incorporate the deno task schemas
there:
clean
needs to be expandedAs preface, I've been experimenting in parallel with a Rust based implementation here:
https://github.com/bdarcus/csln
I'm leaning towards prioritizing that going forward, but plan to align the models. But the success of either project will depend on contributions from others, so we'll see how they go.
At this point, I'm confident in the direction of the basic style model, though the details are in need of wider review and testing.
I have, however, checked my assumptions against what I'm able to glean from the existing style repository using ripgrep
.
Doing so shows that in the areas where I've moved logic from templates/macros to parameters, styles show a lot of duplication, which suggests in retrospect that level of control is not needed in the template language itself.
So I've set up some obvious tentative milestones, and am already ahead of that schedule.
Ideally if this works out, I can transfer this project to the CSL GitHub org, and it can be team managed and developed further.
If you are interested in potentially submitting a PR, let me know before you start; I've been somewhat liberal about rewriting the git history of the main branch so far (though I've avoided this more recently, as things are looking more stable)!
PS - having no previous experience with the js/node ecosystem, I've just opted to use tools here that bring me joy 😊
They're generally easy-to-use, with good UIs, high performance, and minimal dependencies.
Quicktype seems to do a pretty good job of generating model code from JSON schemas to different languages, most notably Rust, Haskell, and Go.
It even includes parsing code in the generated code!
But I just realized it also has "experimental" support for direct conversion from typescript.
To assess, then, typescript-json-schema
vs quicktype
for:
Basically, it may be possible to remove the first, and standardize on the second.
I'm trying to find some sort of library or tool that can programmatically evaluate schema quality and performance, but have not had any luck so far.
A quick grep for "$ref" shows equivalent numbers, but numbers for metadata properties are higher for quicktype. Need to check what that means.
Both validate the same.
But tjs has lots of annotation options to customize output. Does quicktype?
Adapted some ideas from node-ts-starter-cli.
PS - been experimenting with lefthook for commit hooks. This matches the CI, and formats staged files:
# lefthook.yml
# lefthook add pre-commit
pre-commit:
commands:
format:
glob: "*.ts"
run: npx rome format --write {staged_files}
Follow-up to #27.
Add an optional property to named templates that signals the template to use for x substitution.
What to name it, though?
Configs:
console.log(new Intl.DateTimeFormat('en-US', { month: 'long', day: 'numeric' }).format(date));
// "December 19"
console.log(new Intl.DateTimeFormat('es', { dateStyle: 'long' }).format(date));
// "19 de diciembre de 2020"
console.log(new Intl.DateTimeFormat('en-GB', { dateStyle: 'long' }).format(date));
// "19 December 2020"
console.log(new Intl.DateTimeFormat('en-US', { dateStyle: 'long' }).format(date));
// "December 19, 2020"
EDTF.js
is strict about EDTF string parsing; it will error if input is not valid.
So I'll probably have to first confirm input is valid (using isEDTF()
or something similar) and only run edtf()
if it is, and pass it through if not.
And while toLocaleDateString
will work with standard date-times, it won't with intervals. So start with the former, and worry about the rest later.
[...edtf('2001/2002-08~').values].map(d => format(d, 'en')).join(' until ')
//-> '2001 until ca. 8/2002'
I've already added this to the Style options, but am now getting to implementing.
Perhaps the "format" piece could map onto this lookup const?
import edtf, { Date } from "edtf";
import fs from "node:fs";
const d1 = edtf("2016-10");
const date_config = {
month: "full"
};
// use these, and lookup full vs short in the date options?
const dateFormats = {
year: { year: "numeric" },
monthDay: { month: `${date_config.month}`, day: "numeric" },
full: { month: `${date_config.month}`, day: "numeric" },
}
console.log(d1.toLocaleDateString("en-us", dateFormats.monthDay));
console.log(d1.toLocaleDateString("en-us", dateFormats.full));
console.log(d1.year);
Try deno task docs
for documentation.
I estimate the model is roughly 80 percent complete, but I'm increasingly convinced it's a solid foundation.
I'm a TS newbie. While I now better understand the details of how to model with it, this could still use review from people more knowledgeable on the technical details.
At a high-level, there are extensible groups of parameters ("options"), and there are templates.
Templates can be inline, or referenced, as in CSL 1.0.
So far, there's nothing in that description that is any different than 1.0, other than names (and that notion of "extensible").
But the first change is named templates can also be contained in external template files, which is a minor change we could apply to 1.0 also.
The more fundamental change is I am putting much more logic in the the parameters, and trying to leave them out of the templates.
TemplateModel
is currently defined as follows, with separate interfaces for different data types.
type TemplateModel =
| RenderList
| RenderItemTemplate
| RenderItemSimple
| RenderItemContributorList
| RenderItemLocatorList
| RenderItemDate
| RenderItemTitle
| Cond
;
Templates are just flexible lists of objects (RenderItem
), lists/arrays (RenderList
), and a conditional (Cond
).
In CSL 1.0, for example, we have cs:layout
, and there can only be one for each cs:citation
and cs:bibliography
element, and only one of each of those.
This model throws that out, and one can use a top-level Cond
structure to support different features, like local citation commands or styles, or multilingual, without changing the basic model (aside from adding a new Condition
property).
There remains a single citation
and bibliography
property in a Style
, but they each are much more flexible.
Here's the citation
definition, also explicitly defined as a List
:
citation?: RenderItemCitationList;
Effectively how I'm defining the OptionGroup
interfaces would be equivalent to allowing foreign attributes in CSL 1.0.
The idea is to allow extension in an area that wouldn't break parsers, so evolution going forward is easier.
Something similar could be done in 1.0, but would certainly be more difficult, both to implement the schema changes, and to update processors and styles.
One reason to do the modeling in TS is it cleanly converts to JSON Schema, and I'm pretty certain to other languages (Rust, etc.).
Typescript has a third-party library, that I am using here to auto-convert these models to what appears to be compliant and well-defined (if verbose) JSON Schemas.
This tool also appears to do reasonable job at first glance of converting JSON schema (with experimental support for typescript itself) model code in different languages, including Rust, Swift, and Haskell.
There's also this for TS to Lua:
https://typescripttolua.github.io/
Here's the style schema converted to Rust (which does compile without adjustment):
// Example code that deserializes and serializes the model.
// extern crate serde;
// #[macro_use]
// extern crate serde_derive;
// extern crate serde_json;
//
// use generated_module::Style;
//
// fn main() {
// let json = r#"{"answer": 42}"#;
// let model: Style = serde_json::from_str(&json).unwrap();
// }
use serde::{Serialize, Deserialize};
/// A CSL Style.
#[derive(Serialize, Deserialize)]
pub struct Style {
/// The bibliography specification.
#[serde(rename = "bibliography")]
bibliography: Option<Bibliography>,
/// r
/// The categories the style belongs to; for purposes of indexing.
#[serde(rename = "categories")]
categories: Option<Vec<CategoryType>>,
/// The citation specification.
#[serde(rename = "citation")]
citation: Option<Citation>,
/// The description of the style.
#[serde(rename = "description")]
description: Option<String>,
/// The machine-readable token that uniquely identifies the style.
#[serde(rename = "id")]
id: Option<String>,
/// Global parameter options.
#[serde(rename = "options")]
options: Option<OptionGroup>,
/// The templates for rendering the bibliography and citations.
#[serde(rename = "templates")]
templates: Option<Vec<NamedTemplate>>,
/// The human-readable name of the style.
#[serde(rename = "title")]
title: Option<String>,
}
/// The bibliography specification.
#[derive(Serialize, Deserialize)]
pub struct Bibliography {
#[serde(rename = "bold")]
bold: Option<bool>,
/// The string with which to join two or more rendering comnponents.
#[serde(rename = "delimiter")]
delimiter: Option<String>,
#[serde(rename = "emph")]
emph: Option<bool>,
/// The rendering instructions; either called template name, or inline instructions.
#[serde(rename = "format")]
format: Option<BibliographyFormat>,
#[serde(rename = "heading")]
heading: Option<String>,
#[serde(rename = "listStyle")]
list_style: Option<String>,
#[serde(rename = "options")]
options: Option<OptionGroup>,
/// The symbol pair to wrap around one or more rendering components.
/// Interaction with surrounding punctuation is localized.
#[serde(rename = "wrap")]
wrap: Option<WrapType>,
}
#[derive(Serialize, Deserialize)]
pub struct Condition {
/// When a match, process these templates.
#[serde(rename = "format")]
format: Vec<TemplateModel>,
/// Is the item variable a number?
#[serde(rename = "isNumber")]
is_number: Option<LocatorType>,
/// A list of reference item types; if one is true, then return true.
#[serde(rename = "match")]
condition_match: Option<MatchType>,
/// Does the date conform to EDTF?
#[serde(rename = "isEDTFDate")]
is_edtf_date: Option<DateType>,
/// Is the item reference type among the listed reference types?
#[serde(rename = "isRefType")]
is_ref_type: Option<Vec<RefType>>,
/// Does the item reference include one of the listed variables?
#[serde(rename = "hasVariable")]
has_variable: Option<Vec<VariableType>>,
/// The item reference locale; to allow multilingual output.
#[serde(rename = "locale")]
locale: Option<String>,
}
/// A template that is defined inline.
///
/// Integral citations are those where the author is printed inline in the text; aka "in
/// text" or "narrative" citations.
///
/// Non-integral citations are those where the author is incorporated in the citation, and
/// not printed inline in the text.
#[derive(Serialize, Deserialize)]
pub struct TemplateModel {
#[serde(rename = "bold")]
bold: Option<bool>,
/// The string with which to join two or more rendering comnponents.
#[serde(rename = "delimiter")]
delimiter: Option<String>,
#[serde(rename = "emph")]
emph: Option<bool>,
/// The rendering instructions; either called template name, or inline instructions.
#[serde(rename = "format")]
format: Option<TemplateModelFormat>,
#[serde(rename = "options")]
options: Option<OptionGroup>,
/// The symbol pair to wrap around one or more rendering components.
/// Interaction with surrounding punctuation is localized.
#[serde(rename = "wrap")]
wrap: Option<WrapType>,
/// The template name to use for partial formatting.
#[serde(rename = "template")]
template: Option<String>,
#[serde(rename = "variable")]
variable: Option<Type>,
/// When all of the when conditions are nil, format the children.
#[serde(rename = "else")]
template_model_else: Option<Vec<TemplateModel>>,
/// For the first condition that is non-nil, format the children.
#[serde(rename = "when")]
when: Option<Vec<Condition>>,
}
/// Parameter groups.
///
/// Global parameter options.
#[derive(Serialize, Deserialize)]
pub struct OptionGroup {
/// Date formatting configuration.
#[serde(rename = "dateFormatting")]
date_formatting: Option<DateFormatting>,
/// Disambiguation configuration of rendererd group display names.
#[serde(rename = "disambiguate")]
disambiguate: Option<Disambiguation>,
/// Grouping configuration.
#[serde(rename = "group")]
group: Option<Vec<GroupSortType>>,
/// Localization configuration.
#[serde(rename = "localization")]
localization: Option<Localization>,
/// Sorting configuration.
#[serde(rename = "sort")]
sort: Option<Vec<Sort>>,
/// Substitution configuration.
#[serde(rename = "substitute")]
substitute: Option<Substitution>,
}
/// Date formatting configuration.
#[derive(Serialize, Deserialize)]
pub struct DateFormatting {
#[serde(rename = "date")]
date: Option<EStyle>,
#[serde(rename = "month")]
month: Option<MonthStyle>,
#[serde(rename = "time")]
time: Option<EStyle>,
#[serde(rename = "year")]
year: Option<YearStyle>,
}
/// Disambiguation configuration of rendererd group display names.
///
/// Disambiguation of rendered group display name configuration.
#[derive(Serialize, Deserialize)]
pub struct Disambiguation {
#[serde(rename = "addNames")]
add_names: Option<AddNames>,
#[serde(rename = "addYearSuffix")]
add_year_suffix: Option<bool>,
}
/// Localization configuration.
///
/// Terms and data localization configuration.
#[derive(Serialize, Deserialize)]
pub struct Localization {
/// The scope to use for localization.
///
/// "per-item" uses the locale of the reference item, and "global" uses the target language
/// across all references.
#[serde(rename = "scope")]
scope: Option<Scope>,
}
/// Reference sorting configuration.
#[derive(Serialize, Deserialize)]
pub struct Sort {
#[serde(rename = "key")]
key: GroupSortType,
#[serde(rename = "order")]
order: Order,
}
/// Substitution configuration.
///
/// Substitution of variable configuration.
#[derive(Serialize, Deserialize)]
pub struct Substitution {
/// When author is nil, substitute the first non-nil listed variable.
/// Once a substitution is made, the substituted variable shall be set to nil for purposes of
/// later rendering.
#[serde(rename = "author")]
author: Vec<SubstitutionType>,
}
/// The citation specification.
#[derive(Serialize, Deserialize)]
pub struct Citation {
#[serde(rename = "bold")]
bold: Option<bool>,
/// The string with which to join two or more rendering comnponents.
#[serde(rename = "delimiter")]
delimiter: Option<String>,
#[serde(rename = "emph")]
emph: Option<bool>,
/// The rendering instructions; either called template name, or inline instructions.
#[serde(rename = "format")]
format: Option<BibliographyFormat>,
/// Integral citations are those where the author is printed inline in the text; aka "in
/// text" or "narrative" citations.
#[serde(rename = "integral")]
integral: Option<RenderList>,
/// Non-integral citations are those where the author is incorporated in the citation, and
/// not printed inline in the text.
#[serde(rename = "nonIntegral")]
non_integral: Option<RenderList>,
#[serde(rename = "options")]
options: Option<OptionGroup>,
#[serde(rename = "placement")]
placement: Option<Placement>,
/// The symbol pair to wrap around one or more rendering components.
/// Interaction with surrounding punctuation is localized.
#[serde(rename = "wrap")]
wrap: Option<WrapType>,
}
/// Integral citations are those where the author is printed inline in the text; aka "in
/// text" or "narrative" citations.
///
/// Non-integral citations are those where the author is incorporated in the citation, and
/// not printed inline in the text.
#[derive(Serialize, Deserialize)]
pub struct RenderList {
#[serde(rename = "bold")]
bold: Option<bool>,
/// The string with which to join two or more rendering comnponents.
#[serde(rename = "delimiter")]
delimiter: Option<String>,
#[serde(rename = "emph")]
emph: Option<bool>,
/// The rendering instructions; either called template name, or inline instructions.
#[serde(rename = "format")]
format: Option<BibliographyFormat>,
#[serde(rename = "options")]
options: Option<OptionGroup>,
/// The symbol pair to wrap around one or more rendering components.
/// Interaction with surrounding punctuation is localized.
#[serde(rename = "wrap")]
wrap: Option<WrapType>,
}
#[derive(Serialize, Deserialize)]
pub struct NamedTemplate {
/// The name token for the template, for reference from other templates.
#[serde(rename = "name")]
name: String,
#[serde(rename = "options")]
options: Option<OptionGroup>,
#[serde(rename = "template")]
template: Vec<TemplateModel>,
}
#[derive(Serialize, Deserialize)]
#[serde(untagged)]
pub enum BibliographyFormat {
String(String),
TemplateModelArray(Vec<TemplateModel>),
}
#[derive(Serialize, Deserialize)]
#[serde(untagged)]
pub enum TemplateModelFormat {
String(String),
TemplateModelArray(Vec<TemplateModel>),
}
/// A list of reference item types; if one is true, then return true.
#[derive(Serialize, Deserialize)]
pub enum MatchType {
#[serde(rename = "all")]
All,
#[serde(rename = "any")]
Any,
#[serde(rename = "none")]
None,
}
#[derive(Serialize, Deserialize)]
pub enum VariableType {
#[serde(rename = "article")]
Article,
#[serde(rename = "author")]
Author,
#[serde(rename = "book")]
Book,
#[serde(rename = "chapter")]
Chapter,
#[serde(rename = "container-title")]
ContainerTitle,
#[serde(rename = "editor")]
Editor,
#[serde(rename = "issue")]
Issue,
#[serde(rename = "issued")]
Issued,
#[serde(rename = "pages")]
Pages,
#[serde(rename = "publisher")]
Publisher,
#[serde(rename = "title")]
Title,
#[serde(rename = "volume")]
Volume,
}
/// Does the date conform to EDTF?
#[derive(Serialize, Deserialize)]
pub enum DateType {
#[serde(rename = "issued")]
Issued,
}
/// Is the item variable a number?
#[derive(Serialize, Deserialize)]
pub enum LocatorType {
#[serde(rename = "chapter")]
Chapter,
#[serde(rename = "page")]
Page,
}
#[derive(Serialize, Deserialize)]
pub enum RefType {
#[serde(rename = "article")]
Article,
#[serde(rename = "book")]
Book,
#[serde(rename = "chapter")]
Chapter,
}
#[derive(Serialize, Deserialize)]
pub enum EStyle {
#[serde(rename = "full")]
Full,
#[serde(rename = "long")]
Long,
#[serde(rename = "medium")]
Medium,
#[serde(rename = "short")]
Short,
}
#[derive(Serialize, Deserialize)]
pub enum MonthStyle {
#[serde(rename = "long")]
Long,
#[serde(rename = "narrow")]
Narrow,
#[serde(rename = "numeric")]
Numeric,
#[serde(rename = "short")]
Short,
#[serde(rename = "2-digit")]
The2Digit,
}
#[derive(Serialize, Deserialize)]
pub enum YearStyle {
#[serde(rename = "numeric")]
Numeric,
#[serde(rename = "2-digit")]
The2Digit,
}
#[derive(Serialize, Deserialize)]
pub enum AddNames {
#[serde(rename = "all")]
All,
#[serde(rename = "all-with-initials")]
AllWithInitials,
#[serde(rename = "by-cite")]
ByCite,
#[serde(rename = "primary")]
Primary,
#[serde(rename = "primary-with-initials")]
PrimaryWithInitials,
}
#[derive(Serialize, Deserialize)]
pub enum GroupSortType {
#[serde(rename = "as-cited")]
AsCited,
#[serde(rename = "author")]
Author,
#[serde(rename = "title")]
Title,
#[serde(rename = "year")]
Year,
}
/// The scope to use for localization.
///
/// "per-item" uses the locale of the reference item, and "global" uses the target language
/// across all references.
#[derive(Serialize, Deserialize)]
pub enum Scope {
#[serde(rename = "global")]
Global,
#[serde(rename = "per-item")]
PerItem,
}
#[derive(Serialize, Deserialize)]
pub enum Order {
#[serde(rename = "ascending")]
Ascending,
#[serde(rename = "descending")]
Descending,
}
#[derive(Serialize, Deserialize)]
pub enum SubstitutionType {
#[serde(rename = "editor")]
Editor,
#[serde(rename = "title")]
Title,
#[serde(rename = "translator")]
Translator,
}
/// Is the item variable a number?
///
/// Does the date conform to EDTF?
#[derive(Serialize, Deserialize)]
pub enum Type {
#[serde(rename = "author")]
Author,
#[serde(rename = "chapter")]
Chapter,
#[serde(rename = "container-title")]
ContainerTitle,
#[serde(rename = "editor")]
Editor,
#[serde(rename = "issue")]
Issue,
#[serde(rename = "issued")]
Issued,
#[serde(rename = "page")]
Page,
#[serde(rename = "pages")]
Pages,
#[serde(rename = "publisher")]
Publisher,
#[serde(rename = "title")]
Title,
#[serde(rename = "volume")]
Volume,
}
/// The symbol pair to wrap around one or more rendering components.
/// Interaction with surrounding punctuation is localized.
#[derive(Serialize, Deserialize)]
pub enum WrapType {
#[serde(rename = "brackets")]
Brackets,
#[serde(rename = "parentheses")]
Parentheses,
#[serde(rename = "quotes")]
Quotes,
}
#[derive(Serialize, Deserialize)]
pub enum CategoryType {
#[serde(rename = "biology")]
Biology,
#[serde(rename = "science")]
Science,
#[serde(rename = "social science")]
SocialScience,
}
#[derive(Serialize, Deserialize)]
pub enum Placement {
#[serde(rename = "inline")]
Inline,
#[serde(rename = "note")]
Note,
}
I don't know Rust myself, but I'm thinking the type/intertface model here is likely to translate pretty easily to its (or Swift's) structs, Haskell types, etc.
So am thinking this could be a useful reference implementation of a major breaking change, if it's warranted, while also confirming the reasonableness of any style format/model changes.
You can also play with the schemas if you like; here is VSCode, with schema-backed validation and auto-complete.
If one is grouping citations and/or bibliography by author, I think the right way to do that programmatically is to group on normalized full name representations rather than family names; like say:
doe-j:smith-s:jones-k
The behavior and the displayed group name therefore diverge, and we use family name as shorthand in display, which can create conflicts.
But the grouping behavior would be the same regardless of display details.
So maybe here when we say "disambiguation" we really mean of group display names?
I have clarified this in the jsdocs.
It may suggest, however, moving this back the GroupOption
(though I prefer the simplicity of the simpler modeling)?
We need basic citation and bibliography formatting, without worrying about nuances like disambiguation, to assess whether this is likely to work.
If I'm right, it should be relatively easy, and generate reasonable results.
Running deno task cli
shows the current state. In the end, I'm thinking that should be a CLI that works similar to the haskell-based citeproc: can take input of citations, references, and style, in JSON or YAML, and either:
For now, I have in mind rough priorities being in order:
I have some placeholder functions for the first 3 and 4, though they don't work. I think the parameter specs are right though.
Some ideas on template formatting here: #114 (comment).
I also have Contributor classes with some basic methods that work.
> const p1 = new Person("Doe", "Jane");
> p1.getSortName();
"Jane, Doe"
Seems typescript-json-schema
is in maintenance mode, and the maintainer recommends this:
https://github.com/vega/ts-json-schema-generator
#83 may also be a possibility, though I doubt it.
At first thought, I think something like this is the right approach:
- term: available
emph: true
- text: foo bar
Looking at the style repo, it seems this is overwhelming used for localized prefixes, in particular, for content.
That also requires using cs:group
and such.
So maybe instead could do:
- template: container-apa
prefixTerm: in
Or maybe:
- template: container-apa
prefix:
term: in
emph: true
For rich subfield formatting, and also to format djot documents:
https://github.com/jgm/djot.js
This should also be easy, and fun.
import * as djot from "https://esm.sh/@djot/[email protected]";
djot.renderHTML(djot.parse('_Title_ within a "title"'));
My question is mainly strategy (see #30); do we:
I guess the answer may depend on where and how it's integrated, but my impulse is to try 2 first.
Along with #2, the main entry-point should be extended to mirror the haskell citeproc one, but make sure it also aligns with #5.
https://www.makeuseof.com/nodejs-cli-packages-build-tools-best/
Deno-specific:
Or perhaps this should just be a library, and the CLI a separate project?
Regardless of those details, need a CLI.
Premature, but at some point, if this goes anywhere, we'll need to include id
properties on the schemas, set some definitions to be extensible, etc.
https://github.com/YousefED/typescript-json-schema#command-line
Also see API docs, for additional annotations, including formats and examples.
https://github.com/YousefED/typescript-json-schema/blob/master/api.md#annotation-tjs
Also, compare to:
https://github.com/vega/ts-json-schema-generator
Both create pretty verbose schemas, which I haven't figure out how to reign in.
I lean towards template
, though that does raise the singular (call) vs plural (inline) issue that needs more thought. Maybe in that case format
is more generic?
See #36
Just occurred to me, so noting ...
A citekey is a unique identifier for a reference. In the current model, ID
is a required property on Reference
, which is collected in Bibliography
under references
, which is an array.
But an object or dictionary better matches the actual data:
references:
doe1:
title: Some Title
It's also more performant for lookup; this article, for example, concludes:
... working objects is comparatively much faster than the arrays when we don’t need orders.
Perhaps the input model should be the former, and a processor should simply transform it to a list for intermediate processing?
Aside: my initial implementation of what became CSL was XSLT 1.0, which is a purely functional language. So the processing there was all about transforming lists into other lists.
The TemplateModel
and Cond
structure is now pretty flexible, but potentially a bit confusing.
Here's what I think is a mostly correct way to handle configuring default and narrative citations.
citation:
format:
- when:
- mode: narrative
format:
- groupBy: cs-author
template: author-apa
delimiter: ", "
andAs: symbol
format:
- groupBy: cs-year
prefix: (
suffix: )
format:
- variable: issued
else:
- groupBy: cs-author-year # not sure this grouping is right
prefix: (
suffix: )
delimiter: "; "
format:
- template: author-apa
- variable: issued
Maybe we could define two kinds of conditions somehow, so maybe could do something like the below?
Only makes sense if we can clearly distinguish locale
and mode
from the other kinds of conditions? Maybe, in effect, both are kinds of modes?
---
citation:
- modes:
locale: es
citation: narrative
groupBy: cs-author
delimiter: ", "
andAs: symbol
format:
- template: author-apa
suffix: " "
format:
- groupBy: cs-year
prefix: (
suffix: )
variable: issued
In that model, there would be modes
and conditions
.
Does that distinction make sense though?
But then ...
if that makes sense, maybe can generalize that also?
citation:
- modes:
locale: es
citation: narrative
- predicates:
isDate: issued
isRefType: ["book"]
match: any
format:
...
The advantage is it's simpler syntax.
The disadvantage is it's less flexible.
Except, maybe if it was allowed in named templates also, that disadvantage goes away?
template:
name: title-abc
predicates:
locale: es
citation: narrative
isDate: issued
isRefType:
- book
match: all
format: ...
OTOH, not sure how much it buys, or at what costs.
@jgm, when you get a chance, curious what you suggest on integrating djot.js here, for both subfield formatting, and ultimately to integrate it into djot document processing.
Edit: thinking about this more, may need to both use djot.parse on the fields and work with the AST otherwise? Basically, I'm thinking of adding a toDjotAST
method.
But if I need to transform content to the AST, what's the best way to do that? Any suggestions on where to look in djot.js
for inspiration?
Note to self: the djot playground includes filters!
As part of the more general #5.
Templates (aka macros in 1.0) look like this:
- variable: author
emph: true
wrap: parentheses
[
{
"variable": "author",
"emph": true,
"wrap": "parentheses"
}
]
Basically, better to convert this to:
djot.parse()
handle the rest?In 1, then, I'd run djot.parse()
first, and in 2 last.
I know I can just create the AST from that, and then insert subfield content as children
.
But one issue might be that this CSL model here is flat, while Djot is pretty hierarchical.
So with this, the emph is contained in that inner child object.
> djot.parse("hello *_there_*").children[0];
{
tag: "para",
children: [
{ tag: "str", text: "hello ", pos: undefined },
{
tag: "strong",
children: [ { tag: "emph", children: [Array], pos: undefined } ],
pos: undefined
}
],
attributes: undefined,
pos: undefined
}
I need to figure this out soon, as processing functionality builds on it.
Right now, I have Contributor
as an abstract class, which Person
and Organization
"extend".
This is the right approach, I think, from a modeling POV, but the deserialization code, that uses class-transformer
, currently introduces this one little limitation that bothers me: it requires a type
property on the contributor:
- type: organization
name: United Nations
I am trying to figure out a way around that, or at least to define a default when not present, but if that doesn't work, the alternative is a single contributor class.
class Contributor {
name?: string;
familyName?: string;
givenName?: string;
location?: string;
...
Complete example here, that works, but I'm not particularly fond of it.
https://gist.github.com/bdarcus/59d6d90783f29511a6551a19b7fca7bb
Or could ditch the class, and use an interface with functions.
E.g.:
const citeProc = Processor(data, style, "en-US");
On a related note, will need some way to indicate extra template files to load.
Maybe, templateCollections
property that takes an array of string tokens?
Would still need:
Split into multiple properties by input data type.
Merged #52, and then quickly realized a couple of little issues:
typescript-eslint
, and need to check if rome
really replaces itedtf
is a dependency; not a dev-dependencySo how to do extension?
In CSL 1.0, we didn't, because were unsure of the implications.
If we wanted to in XML, we would namespace all attributes, and allow foreign-namespaced attributes in certain places.
<if foo:bar="true" cs:variable="book">
That would be easy in some ways (probably take 30-60 minutes of schema work), but a major breaking change in others (code would need updating; also, all styles, though that wouldn't be hard).
In typescript
, we export
an interface
, which allows us, or others, to add new properties, and then in the generated JSON Schema allow other properties in those "certain places" (like Condition
).
Caveat: currently Condition
is a type, which can't be extended in this way, so need to figure that out.
My read of the docs says it's not possible, and would somehow need to refactor Condition
to use interfaces instead.
But maybe this is saying it is possible with type aliases?
type Foobar = 'FOO' | 'BAR';
type FoobarBaz = Foobar | 'BAZ'; // or: 'BAZ' | Foobar
Or here:
type PartialPointX = { x: number; };
type Point = PartialPointX & { y: number; };
In earlier iterations of my thinking, I was imagining templates which could take arbitrary lists of piped function names.
If this is still promising:
const sum = (n: number) => n + 10;
const double = (n: number) => n * 2;
const divide = (n: number) => n / 2;
const combine = (result, nextFun) => nextFun(result);
const pipe = (...fns) => x => fns.reduce(combine, x);
const result = pipe(sum, double, divide)(10);
https://www.telerik.com/blogs/functional-programming-typescript
https://dev.to/ecyrbe/how-to-use-advanced-typescript-to-define-a-pipe-function-381h
https://dev.to/nexxeln/implementing-the-pipe-operator-in-typescript-30ip
... perhaps ideally one inspired by the 1.0 one, at least ultimately, if there's a need?
Or maybe could write initially in something like jest
or vitetest, but later convert it something language-agnostic?
Seems the primary downside for jest ATM is "experimental" support for ES modules, while vitetest
is based on ESM and esbuild
.
EDIT: node has it's own test runner now, marked stable in v20. Currently needs some tweaks to work with typescript, but I expect those to go away a they improve it.
https://github.com/scottwillmoore/node-test-with-typescript
If jest:
#59 mostly addresses this; aside from use of the @Type
decorator to address nested Contributor arrays.
For this purpose, need to go back to the CSL 1.1 approach.
Not sure the best approach, as that's maybe heavier-weight than needed? I do think it's clean though from a modelling POV.
My current plan, carried over from discussions for the v1.1 branch of CSL, standardizes on an EDTF string for input, and I'm hoping to convert to an EDTF Date object when deserializing in #46 (though this may be too ambitious, given there are many different types of objects in EDTF).
But formatting?
I'd prefer to avoid reintroducing the 1.0 date/date-part structures if possible, since it's another piece of template complexity.
The JS toLocaleDateString
method looks like a pretty awesome approach.
> const dateConfig = { weekday: "long", year: "numeric", month: "short", day: "numeric"}
> d1.toLocaleDateString('en-us', dateConfig)
'Thursday, Apr 27, 2023'
> d1.toLocaleDateString('es', dateConfig)
'jueves, 27 de abr de 2023'
> d1.toLocaleDateString('de', dateConfig)
'Donnerstag, 27. Apr. 2023'
> d1.toLocaleDateString('ja', dateConfig)
'2023年4月27日木曜日'
const d1 = edtf("2016-10");
const d_month_day = { month: "short", day: "numeric" };
const d_full = { year: "numeric", month: "long", day: "numeric" };
console.log(d1.toLocaleDateString("en-us", d_month_day));
console.log(d1.toLocaleDateString("es", d_full));
console.log(d1.year);
Of note:
See also the newer Intl.DateTimeFormat, which is similar:
> (new Intl.DateTimeFormat('en-GB', { dateStyle: 'full', timeStyle: 'long', timeZone: 'Australia/Sydney' }).format(d1));
'Thursday, 27 April 2023 at 22:04:49 GMT+10'
But I think we may be able to get away with adding Dates
as an OptionGroup
?
I think I'll try to go with this first, as it keeps things simple, but promises to be international-friendly.
If it didn't work, would need ability to localize within styles, with approaches like this:
date.format(now, 'ddd, MMM DD YYYY');
Also, I think options.group.key
should be plural.
This, and maybe set some default values.
constructor(...args) {
super(...args);
}
See:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Classes/constructor
Probably need to revert back to the CSL 1.0 design for this:
sort:
- key: author
order: ascending
- key: year
order: descending
This is wrong:
With the last option, it should just be:
sort: as-cited
There's no need there for the more complicated modeling.
In RNC, I can do:
foo = ( one | two | three )+, four
That means, at least one of the first three, and a required "four".
I need to do something similar in places, like Condition
, where I need to require a match
property, and at least one of the other properties.
How do I model that?
This is a placeholder ATM that reflects the intention to define CSLDate
as either a literal string, or EDTF
date.
It would use third-party libraries for parsing and schema validation.
I suspect this will be fairly easy, with some interesting design choices for dealing with formatting of complex EDTF dates.
EDIT: The jsdoc format incantation to get EDTF validation into the schema is:
/**
* @format edtf/level-1+season-intervals
*/
Just need to also allow fallback to literal string
. Or maybe that's not possible, and better done elsewhere?
I merged the simple solution below in the linked commit. Will need to think about the formatting question, however.
It may be enough to have the values in the array be template names?
Again, though we'd want to confirm this with the existing styles, it seems there are a small number of these, with by far the most important being for missing authors.
Perhaps we might do something like this globally?
substitutions:
- author:
- editor
- translator
- title
Or even:
authorSubstitutions:
- editor
- translator
- title
I do realize there are some nuances here (like the above very common pattern often needs a macro for title formatting), but perhaps we can resolve that, and avoid adding back another piece of template complexity?
PS - this command will match across the styles directory, and show the preceding five lines. Not surprisingly, the above logic is extremely common, though I don't know of a convenient way to write a script to quickly quantify that.
grep -B 6 "</substitute" ../csl/styles/*.csl
Something seems a bit off ATM.
Need to be able to do stuff like this.
This currently is valid:
- when:
- hasVariable:
- issued
# - accessed
match: any
format:
- delimiter: " "
format:
- template: one
- template: two
... but this is not (and so is a bug):
else:
format:
- delimiter: ", "
format:
- template: one
- template: two
EDIT: I think that outer format
isn't allowed there, because unnecessary.
But we shouldn't require duplication to change a delimiter.
I think, however, we need to allow and encourage the below (and maybe somehow also move that to a parameter)?
- delimiter: " "
format: one
cc @adam3smith - here's an answer to one of your questions, though there's an inconsistency I need to fix.
If we get though #7, next step would be models for the other key components; notably citations, references, locators, and terms.
On the latter, my impression is the current CSL term model is sound, so it would just need translating into typescript
.
Use chore instead.
Not really sure why it should be restricted to the citation or bibliography contexts?
Now I'm wondering if it all needs a bit of restructuring; say:
title: Foo
options:
grouping:
templates:
citation:
options:
contexts:
options:
templates:
So consistent distinction between options
and templates
, available at all levels, where lower-levels override higher levels?
Or maybe the current distinction between templates
(available only globally, and in external files) and formats
(available only at lower levels), makes sense?
Or to go back to the "flatter" idea, maybe options
is only available at the top-level, and lower-level parameters are also configured there? It's easier to program, I'd think.
Both deno and bun are much faster on the simple main.ts
app than node, so seems sensible to try to retain compatibility across them.
With #91, I took a step toward deno compatibility, which may be sufficient for now.
I think this works as expected, but need to test a bit more:
> import { Processor } from "https://cdn.jsdelivr.net/gh/bdarcus/csl-next.js/src/processor.ts";
undefined
> const citeProc = new Processor();
undefined
> citeProc
Processor { style: undefined, bibliography: undefined }
See:
https://github.com/denoland/fresh
https://github.com/denoland/dnt
May want to add an import map, and support for it in esbuild?
https://github.com/trygve-lie/esbuild-plugin-import-map
Migrate to a deno-first approach entirely.
https://github.com/bdarcus/csln-deno-test
Here the typescript is targeting deno, but I use dnt
to transform it into a npm module.
The advantage of this setup:
Since I've switched to Deno, the approach probably changes a bit.
These projects may present some ideas:
https://github.com/lumeland/lume
This article suggests:
For new packages, don’t publish index or library modules. Publish a module for each package export that can be deep imported. It’s perfectly valid to not have a main field in the package.json.
For packages with index or library modules, remove them in a SemVer major release. To help users migrate, in the changelog entry and GitHub release markdown use diff code blocks to show exactly how each possible import in a project should be updated.
Here's a project by the author that implements the ideas: https://github.com/jaydenseric/test-director.
https://stackoverflow.com/a/43951115/13860420
https://www.typescriptlang.org/docs/handbook/modules.html#default-exports
This is related to a big question about the processor model; if I want to continue with the current Class-based approach.
If I don't, that may suggest reorganizing something like the below, where each exports a default function:
src/
processor/
sort.ts
group.ts
date.ts
title.ts
locators.ts
contributors.ts
I added AJV for validation, but didn't realize it has more advanced features.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.