retextjs / retext-spell Goto Github PK

View Code? Open in Web Editor NEW

68.0 6.0 17.0 189 KB

plugin to check spelling

Home Page: https://unifiedjs.com

License: MIT License

JavaScript 100.00%

retext retext-plugin spell spell-check spell-checker natural-language

retext-spell's People

Contributors

Stargazers

Watchers

Forkers

janwirth anukat2015 bkeepers nodatall jdrew1303 miguelramosfdz tbroadley raipc viktorbengtsson damianofusco x4121 patrickarlt beingathar tvquizphd sshyran shyransystems deangaffney

retext-spell's Issues

Cache suggestions array

Subject of the feature

Start caching more information such as the array of suggestions returned by nspell so that when a word is pulled from the cache the list of suggestions is available to iterate over.

Problem

If suggestions have already been created for a specific word the reason string is placed in the cache and if that word is checked again the reason string is taken from the cache rather than using nspell. When the cache is used it only contains the reason string and not the array of suggestions.

It would be useful if the cache contained the array of suggestions too so that both the reason and the original array of suggestions are present on the returned Message object.

If we look at the suggestions when we misspell the word hello, the first time we can see the list of suggestions is populated on the message.expected attribute like so:

{
  "data": {},
  "messages": [
    {
      "message": "`helo` is misspelt; did you mean `hello`, `help`, `helot`, `halo`, `held`, `hell`, `helm`, `hero`?",
      "name": "1:1-1:5",
      "reason": "`helo` is misspelt; did you mean `hello`, `help`, `helot`, `halo`, `held`, `hell`, `helm`, `hero`?",
      "line": 1,
      "column": 1,
      "location": {
        "start": {
          "line": 1,
          "column": 1,
          "offset": 0
        },
        "end": {
          "line": 1,
          "column": 5,
          "offset": 4
        }
      },
      "source": "retext-spell",
      "ruleId": "helo",
      "fatal": false,
      "actual": "helo",
      "expected": [
        "hello",
        "help",
        "helot",
        "halo",
        "held",
        "hell",
        "helm",
        "hero"
      ]
    }
  ],
  "history": [],
  "cwd": "/",
  "contents": "helo"
}

If the word helo is checked again the reason string is pulled from the cache and the returned Message object looks like this:

{
  "data": {},
  "messages": [
    {
      "message": "`helo` is misspelt; did you mean `hello`, `help`, `helot`, `halo`, `held`, `hell`, `helm`, `hero`?",
      "name": "1:1-1:5",
      "reason": "`helo` is misspelt; did you mean `hello`, `help`, `helot`, `halo`, `held`, `hell`, `helm`, `hero`?",
      "line": 1,
      "column": 1,
      "location": {
        "start": {
          "line": 1,
          "column": 1,
          "offset": 0
        },
        "end": {
          "line": 1,
          "column": 5,
          "offset": 4
        }
      },
      "source": "retext-spell",
      "ruleId": "helo",
      "fatal": false,
      "actual": "helo",
      "expected": []
    }
  ],
  "history": [],
  "cwd": "/",
  "contents": "helo"
}

In the cached object the expected array is missing from message object.

Expected behavior

In the case where a cached result is used the npsell suggestions array should also be cached so that the message.expected array is always populated like so:

{
  "data": {},
  "messages": [
    {
      "message": "`helo` is misspelt; did you mean `hello`, `help`, `helot`, `halo`, `held`, `hell`, `helm`, `hero`?",
      "name": "1:1-1:5",
      "reason": "`helo` is misspelt; did you mean `hello`, `help`, `helot`, `halo`, `held`, `hell`, `helm`, `hero`?",
      "line": 1,
      "column": 1,
      "location": {
        "start": {
          "line": 1,
          "column": 1,
          "offset": 0
        },
        "end": {
          "line": 1,
          "column": 5,
          "offset": 4
        }
      },
      "source": "retext-spell",
      "ruleId": "helo",
      "fatal": false,
      "actual": "helo",
      "expected": [
        "hello",
        "help",
        "helot",
        "halo",
        "held",
        "hell",
        "helm",
        "hero"
      ]
    }
  ],
  "history": [],
  "cwd": "/",
  "contents": "helo"
}

Alternatives

Add a cache in my own code which maps a word to the array of suggestions, possible to do but adding it to the library would make it available to other users who would like to iterate over suggestions.
Adapting my code to parse the reason string to pull the information, possible to do but I think always getting back the suggestions array would be easier to work with.

Note

I am willing to do this work, but I just wanted to open an issue first in case there was any reason this should not be done. Thanks!

Integrate with CLI or Atom

Hey @wooorm! 👋

Is it possible to use this on the CLI, or within an editor that shows all of the issues at once in a block? I'm tired of manually looking for red underlines in Atom, and want a cleaner solution.

Ignore RegExp

Would there be interest in a PR allowing regexps as well as strings in the ignore array?

Example use-case: ignoring names of html headings like /h\d/.

v1.0.0 not flagging spelling errors

I tried plugging v1.0.0 of retext-spell into quality-docs, but it wasn't flagging any spelling errors. I thought maybe the spelling errors were being swallowed be some other filter I had in my code, so I stripped everything out, and eventually tried the usage example from the README;

var retext = require('retext');
var spell = require('retext-spell');
var dictionary = require('dictionary-en-gb');
var report = require('vfile-reporter');

retext()
  .use(spell, dictionary)
  .process('Some useles mispelt documeant.', function (err, file) {
    console.error(report(err || file));
  });

When I save that as index.js and run it with node index.js, I get;

no issues found

So I think version 1.0.0 is not flagging spelling errors at all.

Not available via NPM :O(

Is it possible to use it on the client-side ?

I would like to check spelling on an online editor in a web app.
I see that the dictionaries use file-system. Is there a way to bypass that?

Numbers and words with punctuation should be excluded

I think numbers and words with punctuation should be excluded by this plugin. It doesn't make much sense to see results like this;

  55:6-55:32     warning  some-filename.json is misspelled  spelling
  59:115-59:124  warning  250 is misspelled                   spelling

Although it might make sense to include things like e.g. or well-known, so I'm not 100% sure about this. What are your thoughts @wooorm? Do you excluding words with punctuation from the plugin would be the right solution, or do you think there's a better solution? You've got a lot more experience than I do writing code that deals with natural human language. 🤓

Unexpected capital letters returned for certain capitalized misspellings

TLDR: see PR 38 and PR 39 that I've opened against nspell.

Subject of the issue

Note: I've changed the name of this issue from "Mysterious capital E returned for misspelled 5-letter nouns with a single capital T" to "Unexpected capital letters returned for certain capitalized misspellings," and I've edited this post slightly to reflect the broader scope.

Background

I had originally found this error for capitalized variants of 16 dictionary-en words: "tepee", "thane", "thole", "three", "throe", "tilde", "tinge", "tonne", "toque", "tribe", "trike", "trope", "trove", "truce", "tuque", and "twine". To give one notable example, any misspelling matching this RegEx /^Thre[f-ln-racuvxyz]$/ is corrected to "ThreE" instead of "Three."

Edit The below algorithm produces misspellings of the original 16 dictionary words, but I have since found 190 additional 5 letter words that occasionally occur in retext-spell vfile messages with extraneous capital letters. I have saved these new words and the list of misspellings needed to generate them in a json file bundled with the gist for this issue.

Generating examples

The gist to reproduce this issue tests misspellings generated as such:

Capitalize any 5-letter dictionary-en word starting with "T" and ending in "e"
Ensure that the word has affix code /MS in index.dic
Ensure that the word is not Torte (due to the second "t")
Replace the final "e" with a single letter (except "t")
Optionally add the plural "s" or the possessive "'s"

If the misspellings do not match a different dictionary word more closely than the originally selected 5-letter word, then the first "expected" value in the vfile message emitted by retext-spell will be the originally selected 5-letter word with final "e" mistakenly capitalized as "E".

Edit Without getting into the details of nspell's keyboard groups, there is no easy way to generate the 190 newly discovered 5-letter words that do not match the misspellings generated with the above method.

Your environment

OS: MacOS Sierra 10.12.6
Packages: dictionary-en==3.0.1, retext==7.0.1, retext-spell==4.0.0
Env: node==15.3.0, npm==7.0.14

Steps to reproduce

I've created a gist.

Execute the following commands to download the gist and install dependencies:

git clone https://gist.github.com/a4e2ff11cd868a5b40a65b3c53c8574a.git
cd a4e2ff11cd868a5b40a65b3c53c8574a
npm install

Run one of the following commands to test with various suffixes:

npm run test "*" (for no suffix)
npm run test "*s" (for the plural)
npm run test "*'s" (for the possessive)

Side note In contrast to the examples that produce the bug defined in this issue, you can run npm run test "t", npm run test "ts", and npm run test "t's" to see the results of misspellings that fail to produce the bug due to the presence of a lowercase "t" in the misspelling.

Expected behavior

All the logged vfile message reasons should show suggested values without unusual capitalization. The hundreds of misspellings tested with npm run test "*", npm run test "*s", and npm run test "*'s" should generate suggested values with lowercase "e" characters. For example, the first tested misspelling Tepea should generate a top suggested value of "Tepee". The plural Tepeas should generate a top suggested value of "Tepees". The possessive Tepea's should generate a top suggested value of "Tepee's".

Actual behavior

The hundreds of misspellings tested with npm run test "*", npm run test "*s", and npm run test "*'s" all generate suggested values with uppercase "E" characters. For example, the first tested misspelling Tepea generates a top suggested value of "TepeE". The plural Tepeas generates a top suggested value of "TepeEs". The possessive Tepea's generates a top suggested value of "TepeE's".

Ignore URLs?

Hello, I've been using this (and a number of your other retexts!) and it's been working great except sometimes the strings in question have URLs in them which always get flagged. I'm not sure if this is a problem that I should handle on my end (and if so, I would love any advice you have), but if you think it would be useful, perhaps there could be an 'ignoreURL' or even 'ignoreRegex' option?

Perhaps you could use something like https://www.npmjs.com/package/valid-url if you were to add this in.

Example in project README does not work

Initial checklist

I read the support docs
I read the contributing guide
I agree to follow the code of conduct
I searched issues and couldn’t find anything (or linked relevant results below)

Affected packages and versions

6.0.0

Link to runnable example

No response

Steps to reproduce

I attempted to run the example in the project README using the following steps:

Create a new Node project:
```
npm init es6 -y
```

Install dependencies:

npm install --save-exact dictionary-en retext retext-spell vfile-reporter

Copy example:

echo "import dictionaryEn from 'dictionary-en'
import {retext} from 'retext'
import retextSpell from 'retext-spell'
import {reporter} from 'vfile-reporter'

const file = await retext()
  .use(retextSpell, {dictionary: dictionaryEn})
  .process('Some useles documeant.')

console.error(reporter(file))" > index.js

Run example:
```
node index.js
```

This throw an error:

file:///some/path/retext-spell-repro/node_modules/retext-spell/lib/index.js:106
    throw new TypeError('Missing `dictionary` in options')
          ^

TypeError: Missing `dictionary` in options
    at Function.retextSpell (file:///some/path/retext-spell-repro/node_modules/retext-spell/lib/index.js:106:11)
    at Function.freeze (file:///some/path/retext-spell-repro/node_modules/unified/lib/index.js:636:36)
    at Function.process (file:///some/path/retext-spell-repro/node_modules/unified/lib/index.js:716:10)
    at file:///some/path/retext-spell-repro/index.js:8:4

Node.js v21.5.0

Expected behavior

I would expect the code to run, with the example output given in the project README produced.

Affected runtime and version

[email protected]

Affected package manager and version

[email protected]

Affected OS and version

Arch Linux

Build and bundle tools

No response

Too many false positives

Is there a way to disable errors for words it does not recognize?

For example when I am writing technical content, with peoples names I get hundreds of occurrences that I would have to add to my dictionary or ignore list. Is there a way to only check words that he has confidence or suggestions?

Multiple dictionaries

Is it possible to allow more than one dictionary for retest-spell? I have an open source project where I need to the use an English dictionary for general spell checking and a project specific one for the technical/unique words.

I need an affix file for the project specific, so I can't use the personal dictionary option. NSpell supports multiple dictionaries being passed.

Possible security issue with lodash.includes

I received a warning from Sonatype DepShield that retext-spell is using a vulnerable version of lodash.includes and points to this advisory.

So maybe you can check if this repo can be updated to lodash 4.17.5.

I'm not too experienced with JavaScript, but this should be possible by replacing

package.json:30
- "lodash.includes": "^4.2.0",
+ "lodash": ">= 4.17.5",

and

index.js:7
- var includes = require('lodash.includes')
+ var includes = require('lodash/includes')

Steps to reproduce

If you want to get the same report, either enable Sonatype DepShield in your GitHub account or create a new repository using retext-spell and only enable DepShield for this repository (you can limit access to single repos on setup).

Ignore times in common formats (e.g. 2:41PM)

Initial checklist

I read the support docs
I read the contributing guide
I agree to follow the code of conduct
I searched issues and couldn’t find anything (or linked relevant results below)

Affected packages and versions

5.1.0

Link to runnable example

https://peaceful-mclean-b30fef.netlify.app

Steps to reproduce

Open the runnable example.
In the text field, paste the following string: Good morning, today is 2:41pm.
In the right pane, you'll see that a warning is returned for the time of day.

Expected behavior

TL;DR: I expect that the retext-spell plugin wouldn't return any warnings for times in common formats like 2:41pm.

Description

First, retext and unified as a whole is an incredible project. Thank you for making and maintaining this.

I'm working on creating a JavaScript prose linter for anyone who writes copy in our UI. It's accessible via a browser or in a Figma Plugin since it's primarily meant for product designers to use. The prose linter that I've created uses many existing retext plugins (like retext-spell) and several custom plugins for rules to help our teams adhere to our writing style guide. Since my company is a software monitoring tool, we often use times (e.g. 2:41pm) in our UI and it's very plausible that folks will attempt to use my plugin on text layers that contain times.

I'm taking advantage of the ignoreDigits option and it's working wonderfully! I was hoping to find another option that allows consumers of the project to control whether to ignore words that contain any digits as opposed to only digits (as the ignoreDigits option does).

Affected runtime and version

[email protected]

Affected package manager and version

[email protected]

Affected OS and version

macOS Monterey 12.2.1

Build and bundle tools

webpack, Vite

retextjs / retext-spell Goto Github PK

retext-spell's People

Contributors

Stargazers

Watchers

Forkers

retext-spell's Issues

Subject of the feature

Problem

Expected behavior

Alternatives

Note

Subject of the issue

Background

Generating examples

Your environment

Steps to reproduce

Expected behavior

Actual behavior

Initial checklist

Affected packages and versions

Link to runnable example

Steps to reproduce

Expected behavior

Affected runtime and version

Affected package manager and version

Affected OS and version

Build and bundle tools

Possible security issue with lodash.includes

Steps to reproduce

Initial checklist

Affected packages and versions

Link to runnable example

Steps to reproduce

Expected behavior

Description

Affected runtime and version

Affected package manager and version

Affected OS and version

Build and bundle tools

Recommend Projects

Recommend Topics

Recommend Org