retextjs / retext-spell Goto Github PK
View Code? Open in Web Editor NEWplugin to check spelling
Home Page: https://unifiedjs.com
License: MIT License
plugin to check spelling
Home Page: https://unifiedjs.com
License: MIT License
Start caching more information such as the array of suggestions returned by nspell so that when a word is pulled from the cache the list of suggestions is available to iterate over.
If suggestions have already been created for a specific word the reason
string is placed in the cache and if that word is checked again the reason
string is taken from the cache rather than using nspell. When the cache is used it only contains the reason
string and not the array of suggestions.
It would be useful if the cache contained the array of suggestions too so that both the reason and the original array of suggestions are present on the returned Message
object.
If we look at the suggestions when we misspell the word hello
, the first time we can see the list of suggestions is populated on the message.expected
attribute like so:
{
"data": {},
"messages": [
{
"message": "`helo` is misspelt; did you mean `hello`, `help`, `helot`, `halo`, `held`, `hell`, `helm`, `hero`?",
"name": "1:1-1:5",
"reason": "`helo` is misspelt; did you mean `hello`, `help`, `helot`, `halo`, `held`, `hell`, `helm`, `hero`?",
"line": 1,
"column": 1,
"location": {
"start": {
"line": 1,
"column": 1,
"offset": 0
},
"end": {
"line": 1,
"column": 5,
"offset": 4
}
},
"source": "retext-spell",
"ruleId": "helo",
"fatal": false,
"actual": "helo",
"expected": [
"hello",
"help",
"helot",
"halo",
"held",
"hell",
"helm",
"hero"
]
}
],
"history": [],
"cwd": "/",
"contents": "helo"
}
If the word helo
is checked again the reason
string is pulled from the cache and the returned Message
object looks like this:
{
"data": {},
"messages": [
{
"message": "`helo` is misspelt; did you mean `hello`, `help`, `helot`, `halo`, `held`, `hell`, `helm`, `hero`?",
"name": "1:1-1:5",
"reason": "`helo` is misspelt; did you mean `hello`, `help`, `helot`, `halo`, `held`, `hell`, `helm`, `hero`?",
"line": 1,
"column": 1,
"location": {
"start": {
"line": 1,
"column": 1,
"offset": 0
},
"end": {
"line": 1,
"column": 5,
"offset": 4
}
},
"source": "retext-spell",
"ruleId": "helo",
"fatal": false,
"actual": "helo",
"expected": []
}
],
"history": [],
"cwd": "/",
"contents": "helo"
}
In the cached object the expected
array is missing from message object.
In the case where a cached result is used the npsell suggestions
array should also be cached so that the message.expected
array is always populated like so:
{
"data": {},
"messages": [
{
"message": "`helo` is misspelt; did you mean `hello`, `help`, `helot`, `halo`, `held`, `hell`, `helm`, `hero`?",
"name": "1:1-1:5",
"reason": "`helo` is misspelt; did you mean `hello`, `help`, `helot`, `halo`, `held`, `hell`, `helm`, `hero`?",
"line": 1,
"column": 1,
"location": {
"start": {
"line": 1,
"column": 1,
"offset": 0
},
"end": {
"line": 1,
"column": 5,
"offset": 4
}
},
"source": "retext-spell",
"ruleId": "helo",
"fatal": false,
"actual": "helo",
"expected": [
"hello",
"help",
"helot",
"halo",
"held",
"hell",
"helm",
"hero"
]
}
],
"history": [],
"cwd": "/",
"contents": "helo"
}
reason
string to pull the information, possible to do but I think always getting back the suggestions array would be easier to work with.I am willing to do this work, but I just wanted to open an issue first in case there was any reason this should not be done. Thanks!
Hey @wooorm! 👋
Is it possible to use this on the CLI, or within an editor that shows all of the issues at once in a block? I'm tired of manually looking for red underlines in Atom, and want a cleaner solution.
Would there be interest in a PR allowing regexps as well as strings in the ignore
array?
Example use-case: ignoring names of html headings like /h\d/
.
I tried plugging v1.0.0 of retext-spell
into quality-docs
, but it wasn't flagging any spelling errors. I thought maybe the spelling errors were being swallowed be some other filter I had in my code, so I stripped everything out, and eventually tried the usage example from the README;
var retext = require('retext');
var spell = require('retext-spell');
var dictionary = require('dictionary-en-gb');
var report = require('vfile-reporter');
retext()
.use(spell, dictionary)
.process('Some useles mispelt documeant.', function (err, file) {
console.error(report(err || file));
});
When I save that as index.js
and run it with node index.js
, I get;
no issues found
So I think version 1.0.0 is not flagging spelling errors at all.
I would like to check spelling on an online editor in a web app.
I see that the dictionaries use file-system. Is there a way to bypass that?
I think numbers and words with punctuation should be excluded by this plugin. It doesn't make much sense to see results like this;
55:6-55:32 warning some-filename.json is misspelled spelling
59:115-59:124 warning 250 is misspelled spelling
Although it might make sense to include things like e.g.
or well-known
, so I'm not 100% sure about this. What are your thoughts @wooorm? Do you excluding words with punctuation from the plugin would be the right solution, or do you think there's a better solution? You've got a lot more experience than I do writing code that deals with natural human language. 🤓
TLDR: see PR 38 and PR 39 that I've opened against nspell
.
Note: I've changed the name of this issue from "Mysterious capital E returned for misspelled 5-letter nouns with a single capital T" to "Unexpected capital letters returned for certain capitalized misspellings," and I've edited this post slightly to reflect the broader scope.
I had originally found this error for capitalized variants of 16 dictionary-en words: "tepee", "thane", "thole", "three", "throe", "tilde", "tinge", "tonne", "toque", "tribe", "trike", "trope", "trove", "truce", "tuque", and "twine". To give one notable example, any misspelling matching this RegEx /^Thre[f-ln-racuvxyz]$/
is corrected to "ThreE" instead of "Three."
Edit The below algorithm produces misspellings of the original 16 dictionary words, but I have since found 190 additional 5 letter words that occasionally occur in retext-spell
vfile
messages with extraneous capital letters. I have saved these new words and the list of misspellings needed to generate them in a json file bundled with the gist for this issue.
The gist to reproduce this issue tests misspellings generated as such:
dictionary-en
word starting with "T" and ending in "e"/MS
in index.dic
Torte
(due to the second "t")If the misspellings do not match a different dictionary word more closely than the originally selected 5-letter word, then the first "expected" value in the vfile
message emitted by retext-spell
will be the originally selected 5-letter word with final "e" mistakenly capitalized as "E".
Edit Without getting into the details of nspell
's keyboard groups, there is no easy way to generate the 190 newly discovered 5-letter words that do not match the misspellings generated with the above method.
I've created a gist.
Execute the following commands to download the gist and install dependencies:
git clone https://gist.github.com/a4e2ff11cd868a5b40a65b3c53c8574a.git
cd a4e2ff11cd868a5b40a65b3c53c8574a
npm install
Run one of the following commands to test with various suffixes:
npm run test "*"
(for no suffix)npm run test "*s"
(for the plural)npm run test "*'s"
(for the possessive)Side note In contrast to the examples that produce the bug defined in this issue, you can run npm run test "t"
, npm run test "ts"
, and npm run test "t's"
to see the results of misspellings that fail to produce the bug due to the presence of a lowercase "t" in the misspelling.
All the logged vfile
message reasons should show suggested values without unusual capitalization. The hundreds of misspellings tested with npm run test "*"
, npm run test "*s"
, and npm run test "*'s"
should generate suggested values with lowercase "e" characters. For example, the first tested misspelling Tepea
should generate a top suggested value of "Tepee". The plural Tepeas
should generate a top suggested value of "Tepees". The possessive Tepea's
should generate a top suggested value of "Tepee's".
The hundreds of misspellings tested with npm run test "*"
, npm run test "*s"
, and npm run test "*'s"
all generate suggested values with uppercase "E" characters. For example, the first tested misspelling Tepea
generates a top suggested value of "TepeE". The plural Tepeas
generates a top suggested value of "TepeEs". The possessive Tepea's
generates a top suggested value of "TepeE's".
Hello, I've been using this (and a number of your other retexts!) and it's been working great except sometimes the strings in question have URLs in them which always get flagged. I'm not sure if this is a problem that I should handle on my end (and if so, I would love any advice you have), but if you think it would be useful, perhaps there could be an 'ignoreURL' or even 'ignoreRegex' option?
Perhaps you could use something like https://www.npmjs.com/package/valid-url if you were to add this in.
6.0.0
No response
I attempted to run the example in the project README using the following steps:
Create a new Node project:
npm init es6 -y
Install dependencies:
npm install --save-exact dictionary-en retext retext-spell vfile-reporter
Copy example:
echo "import dictionaryEn from 'dictionary-en'
import {retext} from 'retext'
import retextSpell from 'retext-spell'
import {reporter} from 'vfile-reporter'
const file = await retext()
.use(retextSpell, {dictionary: dictionaryEn})
.process('Some useles documeant.')
console.error(reporter(file))" > index.js
Run example:
node index.js
This throw an error:
file:///some/path/retext-spell-repro/node_modules/retext-spell/lib/index.js:106
throw new TypeError('Missing `dictionary` in options')
^
TypeError: Missing `dictionary` in options
at Function.retextSpell (file:///some/path/retext-spell-repro/node_modules/retext-spell/lib/index.js:106:11)
at Function.freeze (file:///some/path/retext-spell-repro/node_modules/unified/lib/index.js:636:36)
at Function.process (file:///some/path/retext-spell-repro/node_modules/unified/lib/index.js:716:10)
at file:///some/path/retext-spell-repro/index.js:8:4
Node.js v21.5.0
I would expect the code to run, with the example output given in the project README produced.
Arch Linux
No response
Is there a way to disable errors for words it does not recognize?
For example when I am writing technical content, with peoples names I get hundreds of occurrences that I would have to add to my dictionary or ignore list. Is there a way to only check words that he has confidence or suggestions?
Is it possible to allow more than one dictionary for retest-spell? I have an open source project where I need to the use an English dictionary for general spell checking and a project specific one for the technical/unique words.
I need an affix file for the project specific, so I can't use the personal dictionary option. NSpell supports multiple dictionaries being passed.
I received a warning from Sonatype DepShield that retext-spell is using a vulnerable version of lodash.includes and points to this advisory.
So maybe you can check if this repo can be updated to lodash 4.17.5.
I'm not too experienced with JavaScript, but this should be possible by replacing
package.json:30
- "lodash.includes": "^4.2.0",
+ "lodash": ">= 4.17.5",
and
index.js:7
- var includes = require('lodash.includes')
+ var includes = require('lodash/includes')
If you want to get the same report, either enable Sonatype DepShield in your GitHub account or create a new repository using retext-spell and only enable DepShield for this repository (you can limit access to single repos on setup).
5.1.0
https://peaceful-mclean-b30fef.netlify.app
Good morning, today is 2:41pm.
TL;DR: I expect that the retext-spell plugin wouldn't return any warnings for times in common formats like 2:41pm
.
First, retext and unified as a whole is an incredible project. Thank you for making and maintaining this.
I'm working on creating a JavaScript prose linter for anyone who writes copy in our UI. It's accessible via a browser or in a Figma Plugin since it's primarily meant for product designers to use. The prose linter that I've created uses many existing retext plugins (like retext-spell) and several custom plugins for rules to help our teams adhere to our writing style guide. Since my company is a software monitoring tool, we often use times (e.g. 2:41pm) in our UI and it's very plausible that folks will attempt to use my plugin on text layers that contain times.
I'm taking advantage of the ignoreDigits
option and it's working wonderfully! I was hoping to find another option that allows consumers of the project to control whether to ignore words that contain any digits as opposed to only digits (as the ignoreDigits
option does).
macOS Monterey 12.2.1
webpack, Vite
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.