Giter Club home page Giter Club logo

zhlint's Introduction

logo zhlint

A linting tool for Chinese text content.

How to install

You could easily install zhlint through npm or yarn:

# install through npm
npm install zhlint -g

# or through yarn
yarn global add zhlint

# or through pnpm
pnpm add zhlint -g

Usage

As CLI

# glob files, lint them, and print validation report,
# and exit with code `1` if there is any error found.
zhlint <file-pattern>

# glob files and fix their all possilbly found errors.
zhlint <file-pattern> --fix

# lint the file and output fixed content into another file
zhlint <input-file-path> --output=<output-file-path>

# print usage info
zhlint --help

The validation report might look like this:

Advanced usage

zhlint also supports rc and ignore config files for custom rules:

# .zhlintrc by default
zhlint --config <filepath>

# .zhlintignore by default
zhlint --ignore <filepath>
zhlint --file-ignore <filepath>

# .zhlintcaseignore by default
zhlint --case-ignore <filepath>

# current directory by default
zhlint --dir <path>

In the rc config file, you can write a JSON like:

{
  "preset": "default",
  "rules": {
    "adjustedFullwidthPunctuation": ""
  }
}

For more details, see supported rules.

In the file-ignore file, you can write some lines to ignore files in .gitignore syntax:

In the case-ignore file, you can write some lines of ignored cases like:

( , )

For more details, see setup ignored cases.

As Node.js package

const { run, report } = require('zhlint')

const value = '自动在中文和English之间加入空格'
const options = { rules: { preset: 'default' } }
const output = run(value, options)

// print '自动在中文和 English 之间加入空格''
console.log(output.result)

// print validation report
report([output])

And the format of validation report is more like this:

1:6 - 此处中英文内容之间需要一个空格

自动在中文和English之间加入空格
      ^

1:13 - 此处中英文内容之间需要一个空格

自动在中文和English之间加入空格
             ^
Invalid files:
- foo.md

Found 2 errors.

Advanced usage

zhlint also supports rc and ignore config files for custom rules:

const { readRc, runWithConfig } = require('zhlint')

const value = '自动在中文和English之间加入空格'

const dir = '...' // the target directory path
const configPath = '...' // the config file path
const fileIgnorePath = '...' // the file-ignore file path
const caseIgnorePath = '...' // the case-ignore file path

const config = readRc(dir, configPath, fileIgnorePath, caseIgnorePath)
const output = runWithConfig(value, config)

// ... further actions

As a standalone package

You could find a JavaScript file dist/zhlint.js as a standalone version. To use it, for example, you can directly add it into your browser as a <script> tag. Then there would be a global variable zhlint for you.

API

  • run(str: string, options?: Options): Result: Lint a certain content.
    • parameters:
      • str: The text content you want to lint.
      • options: Some options to config.
    • returns:
      • The result of a single piece of input string. It contains fixed text content as value and the infor of all validations.
  • report(results: Result[], logger?: Console): void: Print out the validation reports for each file.
    • parameters:
      • results: An array for all linted results.
      • logger: The logger instance, by default it's console in Node.js/browser.
  • readRc: (dir: string, config: string, fileIgnore: string, caseIgnore: string, logger?: Console) => Config: Read config from rc & ignore file(s).
  • runWithConfig(str: string, config: Config): Result: Lint a certain content with rc config.

Options

Customize your own linting config and other advanced options.

type Options = {
  rules?: RuleOptions
  hyperParse?: string[]
  ignoredCases?: IgnoredCase[]
  logger?: Console
}
  • rules: customize the linting config. It could be undefined which means linting nothing. It could be { preset: 'default' } which just uses the default config. For more details of RuleOptions, please see supported rules
  • hyperParse: customize the hyper parser by their names. It could be undefined which means just use default ignored cases parser, Markdown parser and the Hexo tags parser.
  • ignoredCases: provide exception cases which you would like to skip.
  • logger: same to the parameter in report(...).

RC Config

  • preset: string (optional)
  • rules: RuleOptions without the preset field. (optional)
  • hyperParsers: string[] (optional)
  • caseIgnores: string[] and the priority is lower than .zhlintcaseignore. (optional)

Output

type Result = {
  // the basic info and availability of the file
  file?: string
  disabled: boolean

  // the original content of the file
  origin: string

  // all the error messages
  validations: Validation[]
}

type Validation = {
  message: string
  index: number
  length: number
}
  • Result
    • file: The file name. It's an optional field which is only used in CLI.
    • origin: the original text content.
    • result: the finally fixed text content.
    • validations: All the validation information.
  • Validation
    • index: The index of the target token in the input string.
    • length: The length of the target token in the input string.
    • message: The description of this validation in natural language.

Features

Markdown syntax support

We support lint your text content in Markdown syntax by default. For example:

run('自动在_中文_和**English**之间加入空格', options)

It will analyse the Markdown syntax first and extract the pure text content and do the lint job. After that the fixed pure text content could be replaced back to the raw Markdown string and returned as the output value in result.

Hexo tags syntax support

Specially, we support Hexo tags syntax just because when we use Hexo to build Vue.js website, the markdown source files more or less include special tags like that so got the unpredictable result.

As a result, we additionally skip the Hexo-style tags by default. For example:

run(
  '现在过滤器只能用在插入文本中 (`{% raw %}{{ }}{% endraw %}` tags)。',
  options
)

Setup ignored cases

In some real cases we have special text contents not follow the rules by reason. So we could ues ignoredCases option to config that. For example we'd like to keep the spaces inside a pair of brackets, which is invalid by default. Then we could write one more line of HTML comment anywhere inside the file:

<!-- the good case -->

text before (text inside) text after

<!-- the bad case -->

vm.$on( event, callback )

<!-- then we could write this down below to make it work -->
<!-- zhlint ignore: ( , ) -->

or just pass it through as an option:

run(str, { ignoredCases: { textStart: '( ', textEnd: ' )' } })

If you want to ignore the whole file, you can also add this HTML comment:

<!-- zhlint disabled -->

Supported preproccessors (hyper parsers)

  • ignore: find all ignored pieces by the HTML comment <!-- zhlint ignore: ... -->
  • hexo: find all Hexo tags to avoid them being parsed.
  • markdown: parse by markdown syntax and find all block-level texts and inline-level marks.

Supported rules

Almost the rules come from the past translation experiences in W3C Requirements for Chinese Text Layout, W3C HTML Chinese interest group and Vue.js Chinese docsite.

... and this part might be controversial. So if you don't feel well at some point, we definitely would love to know and improve. Opening an issue is always welcome. Then we could discuss about the possible better option or decision.

type RuleOptions = {
  /* PRESET */

  // Custom preset, currently only support:
  // - `'default'`
  preset?: string

  /* PUNCTUATIONS */

  // Convert these punctuations into halfwidth.
  // default preset: `()`
  // e.g. `(文字)` -> `(文字)`
  halfwidthPunctuation?: string

  // Convert these punctuations into fullwidth.
  // default preset: `,。:;?!“”‘’`
  // e.g. `文字,文字.` -> `文字,文字。`
  fullwidthPunctuation?: string

  // Treat these fullwidth punctuations as half-fullWidthPunctuation
  // when processing the spaces issues around them.
  // Since something like quotations in morder Chinese fonts are
  // only rendered in halfwidth.
  // default preset: `“”‘’`
  adjustedFullwidthPunctuation?: string

  // Convert traditional Chinese punctuations into simplified ones or vice versa.
  // default preset: `simplified`
  // e.g. `「文字」` -> `“文字”`
  //
  // besides the above, we also unify some common punctuations below:
  //
  // // U+2047 DOUBLE QUESTION MARK, U+203C DOUBLE EXCLAMATION MARK
  // // U+2048 QUESTION EXCLAMATION MARK, U+2049 EXCLAMATION QUESTION MARK
  // '??': ['⁇'],
  // '!!': ['‼'],
  // '?!': ['⁈'],
  // '!?': ['⁉'],
  //
  // // U+002F SOLIDUS, U+FF0F FULLWIDTH SOLIDUS
  // '/': ['/', '/'],
  //
  // // U+FF5E FULLWIDTH TILDE
  // '~': ['~', '~'],
  //
  // // U+2026 HORIZONTAL ELLIPSIS, U+22EF MIDLINE HORIZONTAL ELLIPSIS
  // '…': ['…', '⋯'],
  //
  // // U+25CF BLACK CIRCLE, U+2022 BULLET, U+00B7 MIDDLE DOT,
  // // U+2027 HYPHENATION POINT, U+30FB KATAKANA MIDDLE DOT
  // '·': ['●', '•', '·', '‧', '・'],
  //
  // advanced usage: you can also specify a more detailed map like:
  //
  // ```
  // {
  //   default: true, // follow all the default preset
  //   '「': ['“', '【'], // convert `“` or `【` into `「`
  //   '」': ['”', '】'], // convert `”` or `】` into `」`
  //  '…': true, // follow the default preset for this character
  //  '·': false, // not unify any of these characters
  // }
  // ```
  unifiedPunctuation?:
    | 'traditional'
    | 'simplified'
    | (Record<string, boolean | string[]> & { default: boolean })

  // Special case: skip `fullWidthPunctuation` for abbreviations.
  // default preset:
  // `['Mr.','Mrs.','Dr.','Jr.','Sr.','vs.','etc.','i.e.','e.g.','a.k.a']`
  skipAbbrs?: string[]

  /* SPACES AROUND LETTERS */

  // default preset: `true`
  // - `true`: one space
  // - `undefined`: do nothing
  // e.g. `foo  bar` -> `foo bar`
  spaceBetweenHalfwidthContent?: boolean

  // default preset: `true`
  // - `true`: zero space
  // - `undefined`: do nothing
  // e.g. `文 字` -> `文字`
  noSpaceBetweenFullwidthContent?: boolean

  // default preset: `true`
  // - `true`: one space
  // - `false`: zero space
  // - `undefined`: do nothing
  // e.g. `文字 foo文字` -> `文字 foo 文字` (`true`)
  // e.g. `文字foo 文字` -> `文字foo文字` (`false`)
  spaceBetweenMixedwidthContent?: boolean

  // Special case: skip `spaceBetweenMixedWidthContent`
  // for numbers x Chinese units.
  // default preset: `年月日天号时分秒`
  skipZhUnits?: string

  /* SPACES AROUND PUNCTUATIONS */

  // default preset: `true`
  // - `true`: zero space
  // - `undefined`: do nothing
  // e.g. `文字 ,文字` -> `文字,文字`
  noSpaceBeforePauseOrStop?: boolean

  // default preset: `true`
  // - `true`: one space
  // - `false`: zero space
  // - `undefined`: do nothing
  // e.g. `文字,文字` -> `文字, 文字` (`true`)
  // e.g. `文字, 文字` -> `文字,文字` (`false`)
  spaceAfterHalfwidthPauseOrStop?: boolean

  // default preset: `true`
  // - `true`: zero space
  // - `undefined`: do nothing
  // e.g. `文字, 文字` -> `文字,文字`
  noSpaceAfterFullwidthPauseOrStop?: boolean

  /* SPACES AROUND QUOTATIONS */

  // default preset: `true`
  // - `true`: one space
  // - `false`: zero space
  // - `undefined`: do nothing
  // e.g. `文字 "文字"文字` -> `文字 "文字" 文字` (`true`)
  // e.g. `文字"文字" 文字` -> `文字"文字"文字` (`false`)
  spaceOutsideHalfwidthQuotation?: boolean

  // default preset: `true`
  // - `true`: zero space
  // - `undefined`: do nothing
  // e.g. `文字 “文字” 文字` -> `文字“文字”文字`
  noSpaceOutsideFullwidthQuotation?: boolean

  // default preset: `true`
  // - `true`: zero space
  // - `undefined`: do nothing
  // e.g. `文字“ 文字 ”文字` -> `文字“文字”文字`
  noSpaceInsideQuotation?: boolean

  /* SPACES AROUND BRACKETS */

  // default preset: `true`
  // - `true`: one space
  // - `false`: zero space
  // - `undefined`: do nothing
  spaceOutsideHalfwidthBracket?: boolean

  // default preset: `true`
  // - `true`: zero space
  // - `undefined`: do nothing
  noSpaceOutsideFullwidthBracket?: boolean

  // default preset: `true`
  // - `true`: zero space
  // - `undefined`: do nothing
  noSpaceInsideBracket?: boolean

  /* SPACES AROUND CODE */

  // default preset: `true`
  // - `true`: one space
  // - `false`: zero space
  // - `undefined`: do nothing
  // e.g. '文字 `code`文字' -> '文字 `code` 文字' ('true')
  // e.g. '文字`code` 文字' -> '文字`code`文字' ('false')
  spaceOutsideCode?: boolean

  /* SPACES AROUND MARKDOWN/HTML WRAPPERS */

  // default `true`
  // - `true`: zero space
  // - `undefined`: do nothing
  // e.g. `文字** foo **文字` -> `文字 **foo** 文字`
  noSpaceInsideHyperMark?: boolean

  /* SPACES AT THE BEGINNING/END */

  // default `true`
  // e.g. ` 文字 ` -> `文字`
  trimSpace?: boolean

  /* SKIP PURE WESTERN SENTENCES */

  // default `true`
  skipPureWestern?: boolean
}

More information

zhlint is now open sourced on GitHub and issues welcome.

zhlint's People

Contributors

dependabot[bot] avatar jinjiang avatar justineo avatar liuyib avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

zhlint's Issues

`x‘x’x` 解析出错

如题:用 zhlint 解析 x‘x’x 会出错

Uncaught Error: Unmatched closed quote  at 0
    at to (/xxx/zhlint/dist/zhlint.umd.js:1:6792)
    at /xxx/zhlint/dist/zhlint.umd.js:52:19518
    at Array.map (<anonymous>)
    at Up (/xxx/zhlint/dist/zhlint.umd.js:52:19467)

版本 v0.5.1

v0.4.x ~ v0.5.x 计划

  • 工具链升级 (Vite, VitePress, Vitest, ESLint, Prettier) v0.4.0
    • bugfix v0.4.1
  • 用 TypeScript 重写 (补充类型定义) v0.5.0
  • 修复累计的所有 bug v0.5.x

更多未来打算

  • 提供对使用者更加友好和灵活的配置方式 (目前的配置项比较面向实现友好)
  • 提供更友好的 debug 视图或工具 (针对所有的 rule 逐条追踪发现的问题和改动)
  • 至最新的 Vue 中文文档仓库里跑一遍,同时把发现的问题作为测试用例补充到这里
  • Tech design docs

支持 markdown 中的大括号

示例:# 简介 {#introduction}

  • 目前结果:# 简介{ #introduction} + error report: "There should be no space before a punctuation."
  • 预期结果:# 简介 {#introduction}

防止中文标点符号出现在类似加粗的 markdown 标记的内侧

摘自 Vue 中文翻译 wiki

  1. 在加粗段落,仅当影响 markdown 渲染情况下,将最后一个句号或逗号、冒号,迁移到 ** 右侧,否则保持原样。(讨论:vuejs/docs-next-zh-cn#49vuejs/docs-next-zh-cn#78
// 情况1:
 **相对胜于晦涩。**例如,一个 `<BlogPost>` 组件例子比 `<CurrencyExchangeSettings>` 更好。 // vuepress 无法渲染
(调整为:)
**相对胜于晦涩**。例如,一个 `<BlogPost>` 组件例子比 `<CurrencyExchangeSettings>` 更好。  // 迁移句号有加粗符号右侧

// 情况2:不影响渲染,保持原样
**你好,世界。** ==> **你好,世界。** (不影响渲染,保持原样)

该判断逻辑技术上的难点是需要在 markdown parser 处理之前进行分析和修正。目前 zhlint 暂不支持,且并没有深入到 markdown parser 的定制当中。

如果要做的话,处理逻辑和 zhlint 的其它功能之间应该也是相对独立的。
解决方案待讨论

更多示例记录:

  • 异步组件默认就是 **“suspensible”** 的

v0.6.0 计划

底层重构

  • 重构 parser
  • 重构 rule set

API 更新

  • rules & defaultConfig & preset

文档更新

  • Options: rules
  • Supported rules

ReferenceError: invalidFiles is not defined

yarn global add zhlint

然后在测试运行某个文件的时候:

zhlint ./test/hh.md

出现报错: ReferenceError: invalidFiles is not defined

--- 补充 ---
看了一下应该是 #L72 出现了问题,参数没有定义,建议加上 eslint 工具

Bug: space in blockquotes

e.g.

> [Live Demo ](https://vue-hn.herokuapp.com/)
> 注:如果在一段时间内没有人访问过该网站,则需要一些加载时间。
>

Bug: hyphen between number

e.g.

xx.md 内容为:1-1

执行 zhlint xx.md 出现以下错误

[start] xxxx.md
TypeError: Cannot read property 'content' of undefined
    at module.exports (/Users/lily/.nvm/versions/node/v13.3.0/lib/node_modules/zhlint/src/rules/case-math-exp.js:56:60)
    at travel (/Users/lily/.nvm/versions/node/v13.3.0/lib/node_modules/zhlint/src/travel.js:21:7)
    at processRule (/Users/lily/.nvm/versions/node/v13.3.0/lib/node_modules/zhlint/src/process-rule.js:17:3)
    at /Users/lily/.nvm/versions/node/v13.3.0/lib/node_modules/zhlint/src/run.js:127:52
    at Array.forEach (<anonymous>)
    at /Users/lily/.nvm/versions/node/v13.3.0/lib/node_modules/zhlint/src/run.js:127:36
    at Array.map (<anonymous>)
    at run (/Users/lily/.nvm/versions/node/v13.3.0/lib/node_modules/zhlint/src/run.js:124:54)
    at /Users/lily/.nvm/versions/node/v13.3.0/lib/node_modules/zhlint/bin/index.js:35:39
    at Array.map (<anonymous>)


v0.7.1 计划

(测试 GitHub Tasklist)

### v0.7.1
- [ ] #58 
- [ ] #79 
- [ ] #77 
- [ ] #91 

Support more options

  • half/full-width punctuation: config one-by-one
  • special brackets space: '3 minite(s) left'
  • special punctuation space: false|'left'|'right'|'both'|'keep'
    • spacial combo: '2019-06-26 12:00', '2019年06月26号'

部分 md 加粗中文周围的空格逻辑不正确

如题:用 zhlint 解析部分空格的处理结果不正确

  • a* 啊 *
    • 目前结果:a *啊* (结尾有一个空格)
    • 预期结果:a *啊*
  • * 啊 *
    • 目前结果:* 啊* (内部多了一个空格)
    • 预期结果:*啊*

版本 v0.5.1

与 lint-staged 不兼容

因为 zhlint 命令目前要求输入文件是第一个参数,导致 zhlint --fix 无法如 eslint --fix 般用在 lint-staged……

v0.2

  • check error | fix
    • validation records
    • delegate stdout and stderr
    • --validate (exit code)
    • --fix
    • glob files
    • ci test
    • log format
  • type defs
  • docs
  • gh pages (vuepress)
  • build for browser (standalone version)
  • publish

最新 macOS 中的全角引号实际只占半角

这样的话如果全角引号外部没有空格和其他字符的距离视觉上特别近,看上去会有不舒适感。
另外其他操作系统或环境的情况待确认。

常规字体

12345678
abcdefgh
""''“”‘’

image

等宽字体

12345678
abcdefgh
""''“”‘’

image

可以考虑多一个选项强制全角引号外部和其他文本之间确保有一个空格。

Combine with Hexo tags

https://hexo.io/docs/tag-plugins.html

{% raw %}
content
{% endraw %}

{% blockquote [author[, source]] [link] [source_link_title] %}
content
{% endblockquote %}

{% codeblock [title] [lang:language] [url] [link text] [additional options] %}
code snippet
{% endcodeblock %}

{% pullquote [class] %}
content
{% endpullquote %}

{% jsfiddle shorttag [tabs] [skin] [width] [height] %}
{% gist gist_id [filename] %}
{% iframe url [width] [height] %}
{% img [class names] /path/to/image [width] [height] '"title text" "alt text"' %}
{% link text url [external] [title] %}
{% include_code [title] [lang:language] [from:line] [to:line] path/to/file %}
{% youtube video_id %}
{% vimeo video_id [width] [height] %}
{% post_path filename %}
{% post_link filename [title] [escape] %}
{% asset_path filename %}
{% asset_img filename [title] %}
{% asset_link filename [title] [escape] %}

Error report 中的 marker 输出效果改进

现状

目前,在 error report 中,我们会先打印原文片段,然后在第二行适当位置放一个 caret (^) 以示更精确的错误位置。

由于中英文的字符宽度不同,所以为了让 caret 显示在第二行正确的位置,目前的实现方式是在第二行先用黑色字和黑色背景色把 caret 之前的所有文本渲染一遍 (因为全是黑色的所以视觉上看不出来),不论是中文还是英文,这样再渲染一个 caret,就可以对齐在想要的位置。但其实这个方案不够好,因为不是所有的命令行背景都是黑色的,这个假设其实不成立,另外在不支持配色的命令行环境下,大家会看到类似这样的 error report

自动在中文和English之间加入空格
自动在中文和^

解决方案

把 caret 之前的输出换位全角空格和半角空格。具体做法是,先数一下 caret 之前有几个半角字符,几个全角字符,然后在第二行输出相应数量的半角空格和全角空格,然后输出一个 caret

Can we support a vscode plugin?

zhlint is awesome! but I also want use it in vscode to lint Zh, and it will be very powerful when we support it in vscode and use quick fix to correct Zh.

支持 .zhlintrc 配置

目前现状

zhlint 支持一些简单的 rule 配置,然而,事实上目前内部已有的 rule 对用户并不友好,更多的是基于实现的角度节藕而已,与目前普遍的中文格式配置需求存在差异

解决方案

支持配置文件,暂约定为 .zhlintrc,格式暂约定为 JSON。支持一些对用户需求友好的配置,背后与现有的 rule 的实现和内部配置项进行二次适配。

TODO:具体的 rc 文件配置项待设计

其它

另外也可以考虑同时

  1. 支持指定 rc 配置文件的路径
  2. 支持在 package.json 中通过一个 zhlint 的字段进行配置。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.