myint / scspell Goto Github PK
View Code? Open in Web Editor NEWSpell checker for source code
Home Page: https://pypi.python.org/pypi/scspell3k
License: GNU General Public License v2.0
Spell checker for source code
Home Page: https://pypi.python.org/pypi/scspell3k
License: GNU General Public License v2.0
Probably requires #22
It'd be handy to be able to easily create dictionaries from existing libraries, especially various languages stdlib. The idea is
eg. händler
in python 3.7 finds a token ändler
and in 2.7, finds a token ndler
.
The same is also an issue for words with other diacritics
All alphanumeric strings (strings of letters, numbers, and underscores) are spell-checked tokens.
It would be nice if there was an option to custom define additional/all tokenizers. e.g. I want to have underscores as tokenizers but don't want dashes as tokenizers,...
I seem to be getting unusual errors on windows.
For example, when I call scspell using a subprocess in my python application, I get:
E File "C:\Python34-x64\lib\site-packages\coalib\misc\Shell.py", line 70, in run_interactive_shell_command
E process = Popen(command, **args)
E File "C:\Python34-x64\lib\subprocess.py", line 859, in __init__
E restore_signals, start_new_session)
E File "C:\Python34-x64\lib\subprocess.py", line 1114, in _execute_child
E startupinfo)
E OSError: [WinError 193] %1 is not a valid Win32 application
for scspell.py --report-only C:\path\to\file
or simply a FileNotFound for scspell --report-only C:\path\to\file
Maybe it's because youre using distutils ? Using setuptools's entry points may fix it. (We use setuptools in our app and t works well on both windows and linux)
It would be great if scspell could print just a unique list of misspelled tokens. Lots of tools enable me to find the misspelled word in its context after the fact (ag, grep). I just want to use scspell's awesome token parsing abilities. Currently I'm doing something like this:
scspell --report-only files... | cut -d' ' -f 2 | sort -u
but it seems like it would be much more efficient to do it in the program itself.
I'm checking a rather large codebase. Right now I'm working entirely in the /src/
folder, and scspell src/*
works, except when it reaches subdirectories. I get Error: can't read source file 'src/subdir'; skipping (reason: Is a directory)
. It would be super nice to have a commandline arg to specify how many levels of subdirectories to traverse.
Sometimes files contain text in different languages, which you do not want to add to a file specific dictionary.
It would be nice if there was an option like:
// disable-scspell
bonjour monde
hallo welt
//enable-scspell
(can also be other types of comments, e.g. /* disable-scspell */
)
to disable scspell for some lines of a file
A method like this exists for all linting tools e.g. eslint, psalm, phpstan,... so I think it would be useful here too
First of all, nice tool, thanks for doing that.
Is there a way to add a word to dictionary from command line (without interactive mode)?
I'm using this as a linter checker. It works great, however, when I'm in Vim I can't (actually I could but I don't want to) use it in interactive mode. But I still would like to have a way to add a word to dictionary. So something like scspell --add-to-natural-dictionary myfancyword
would be great.
I get the following error while running scspell on https://github.com/k4rtik/htdp/blob/master/htdp-lib/lang/private/tp-dialog.rkt
$ scspell tp-dialog.rkt
Traceback (most recent call last):
File "/usr/local/bin/scspell", line 83, in <module>
scspell_lib.spell_check(files, opts.override_filename, opts.report)
File "/usr/local/lib/python2.7/site-packages/scspell_lib/__init__.py", line 617, in spell_check
spell_check_file(f, dicts, ignores, report_only)
File "/usr/local/lib/python2.7/site-packages/scspell_lib/__init__.py", line 504, in spell_check_file
data, m), filename, file_id, dicts, ignores, report_only)
File "/usr/local/lib/python2.7/site-packages/scspell_lib/__init__.py", line 448, in spell_check_token
(not dicts.match(st, filename, file_id)) and
File "/usr/local/lib/python2.7/site-packages/scspell_lib/_corpus.py", line 227, in match
if self._natural_dict.match(token):
AttributeError: 'NoneType' object has no attribute 'match'
I'm imagining the API would provide an iterator over the file, returning results and spell_check_file
would iterate over these, handling them.
This opens the door for programmatic use of scspell.
The license headers and COPYING.txt
included in this project have an incorrect address for the Free Software Foundation, as detected by the rpmlint utility: https://fedoraproject.org/wiki/Common_Rpmlint_issues#incorrect-fsf-address
An updated copy of the GPLv2 can be found here: https://www.gnu.org/licenses/old-licenses/gpl-2.0.txt
Running the following command changed the headers to a correct address:
find scspell -name '*.py' | xargs sed -i '/You should have received a copy of the GNU General Public License/{
N
/You should have received a copy of the GNU General Public License\n.*along with this program; if not, write to the Free Software/{
N
s/\(.*\)You should have received a copy of the GNU General Public License\n\(.*\)along with this program; if not, write to the Free Software\n\(.*\)Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA/\1You should have received a copy of the GNU General Public License along\n\2with this program; if not, write to the Free Software Foundation, Inc.,\n\351 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA./
}
}'
DEPRECATION: scspell3k is being installed using the legacy 'setup.py install' method, because it does not have a 'pyproject.toml' and the 'wheel' package is not installed. pip 23.1 will enforce this behaviour change. A possible replacement is to enable the '--use-pep517' option. Discussion can be found at pypa/pip#8559
When spellchecking a large codebase, one can get a lot of false positives due to variable names that are just not English. Looking for "rare" misspellings (#24) helps (as typos tend to be, well, unique(ish) whereas variable names tend to be reused; but another useful heuristic may be to include the edit distance (e.g. Damerau-Levenshtein) to the closest correct word -- a long word that's just one typo away from a correct word is more likely to indeed be a typo.
Is there a way not to check spelling of file names?
tom@computer:~/$ scspell --set-dictionary=/home/tom/Dropbox/work/data/spelling.txt
Traceback (most recent call last):
File "/home/tom/hacking/energysage/env/bin/scspell", line 11, in <module>
sys.exit(main())
File "/home/tom/hacking/energysage/env/local/lib/python2.7/site-packages/scspell/__init__.py", line 899, in main
set_dictionary(args.dictionary)
File "/home/tom/hacking/energysage/env/local/lib/python2.7/site-packages/scspell/__init__.py", line 648, in set_dictionary
config.write(f)
File "/usr/lib/python2.7/ConfigParser.py", line 414, in write
fp.write("\n")
TypeError: write() argument 1 must be unicode, not str
tom@computer:~/$ Python 2.7.13
tom@computer:~/$ pip freeze | grep -i scspell
scspell3k==2.1
It would be nice if scspell
supported the pre-commit tool as has been done with codespell.
We just need to add a .pre-commit-hooks.yaml
file at the top level. It would look something like:
- id: scspell
name: scspell
description: 'spell checking'
entry: scspell
args: [--report-only]
language: python
types: [text]
The user would then need to add their source controlled dictionaries. The user .pre-commit-config.yaml
would use a:
- repo: https://github.com/myint/scspell
rev: [new version #]
hooks:
- id: scspell
args: [--use-builtin-base-dict,--override-dictionary=/path/to/dictionary_file.txt]
Hopefully this configuration would make things "stateless" enough that it could be run across different machines without any issue.
I think it'd be handy for scspell to have directory-recursing
abilities. By way of illustration, here's my attempt at it for one
project:
https://github.com/chapel-lang/chapel/blob/master/util/chplspell
All the complexity of that script is in service of two user
conveniences:
The script has a built in default set of directories and file globs
to search for within them, and
If any files or directories are given on the command line, they are
used as the base of the search instead of the default ones.
This way, the user can just type chplspell
and get all the right
files in the tree spell-checked, with the right dictionaries and
options.
If this functionality were moved inside scspell, then this script
could be replaced by effectively a one-liner, invoking scspell with a
description of those defaults and passing through the rest of the
commandline. And then any other projects that want to use scspell in
a similar manner would only need their own similar one-liner instead
of a complex script.
I don't know what would be the best interface for that. The two
alternatives I've thought of are:
In the first alternative, that "one-liner" would be a very long line,
like the following (more generic files and directories than in the
above script):
#!/bin/bash
exec scspell --defdir doc --defdir man --defdir src \
--defglob "*.c" --defglob "*.h" --defglob "*.cpp" \
--use-builtin-base-dict \
--relative-to $PROJ_HOME \
--override-dictionary $PROJ_HOME/.scspell/dictionary \
"$@"
In the second alternative, it would be a simpler one-liner. The
config file would specify all the other options from the command line
above.
#!/bin/bash
exec scspell --project-config $PROJ_HOME/.scspell/$PROJ.scspell.conf "$@"
Presumably using ConfigParser.
I'm not sure how to cleanly associate certain of the globs
(e.g. .tex) with certain behavior (e.g. --no-c-escapes). I could
imagine --deftexglob ".tex", but in addition to being a little gross,
it'd get grosser if there turn out to be languages that need their own
--no-c-escapes sort of switch. (The above script gets away with
includeing README* in default globs, and *.tex in the latex globs only
because there's no README.tex in tree.)
Thoughts?
I noticed this while spell-checking in a code that uses backslashed words for Doxygen, e.g.
/**
* This is a test that uses backslashes.
* \author J. Doe
* \date Today
* \another keyword
*/
The output is, e.g.
test.h:5: Unmatched 'nother' --> {nother}
(i)gnore, (I)gnore all, (r)eplace, (R)eplace all, (a)dd to dictionary, or
show (c)ontext? [i]
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.