myint / scspell Goto Github PK

View Code? Open in Web Editor NEW

87.0 87.0 16.0 611 KB

Spell checker for source code

Home Page: https://pypi.python.org/pypi/scspell3k

License: GNU General Public License v2.0

Python 99.62% Makefile 0.38%

scspell's People

Contributors

Stargazers

Watchers

Forkers

cassella pbogut urmastalimaa bminier epage dhruvsomani jayvdb robotdana giosad jurajpelikan guttume pytis fininicus wis77 sadankhan richb2k

scspell's Issues

Include tests in source tarball

Create a dictionary from a directory

Probably requires #22

It'd be handy to be able to easily create dictionaries from existing libraries, especially various languages stdlib. The idea is

Assume the input is all spell checked
Create a new dictionary file based on the code
Optionally subtract out content from the base dictionary.

scspell splits words tokens with diacritics inside words

eg. händler in python 3.7 finds a token ändler and in 2.7, finds a token ndler.

The same is also an issue for words with other diacritics

Change default tokenizers

All alphanumeric strings (strings of letters, numbers, and underscores) are spell-checked tokens.

It would be nice if there was an option to custom define additional/all tokenizers. e.g. I want to have underscores as tokenizers but don't want dashes as tokenizers,...

Errors on Windows

I seem to be getting unusual errors on windows.
For example, when I call scspell using a subprocess in my python application, I get:

E             File "C:\Python34-x64\lib\site-packages\coalib\misc\Shell.py", line 70, in run_interactive_shell_command
E               process = Popen(command, **args)
E             File "C:\Python34-x64\lib\subprocess.py", line 859, in __init__
E               restore_signals, start_new_session)
E             File "C:\Python34-x64\lib\subprocess.py", line 1114, in _execute_child
E               startupinfo)
E           OSError: [WinError 193] %1 is not a valid Win32 application

for scspell.py --report-only C:\path\to\file or simply a FileNotFound for scspell --report-only C:\path\to\file

Maybe it's because youre using distutils ? Using setuptools's entry points may fix it. (We use setuptools in our app and t works well on both windows and linux)

option to report list of unique misspellings

It would be great if scspell could print just a unique list of misspelled tokens. Lots of tools enable me to find the misspelled word in its context after the fact (ag, grep). I just want to use scspell's awesome token parsing abilities. Currently I'm doing something like this:

scspell --report-only files... | cut -d' ' -f 2 | sort -u

but it seems like it would be much more efficient to do it in the program itself.

Allow directory walking

I'm checking a rather large codebase. Right now I'm working entirely in the /src/ folder, and scspell src/* works, except when it reaches subdirectories. I get Error: can't read source file 'src/subdir'; skipping (reason: Is a directory). It would be super nice to have a commandline arg to specify how many levels of subdirectories to traverse.

Allow disabling/ignoring some lines using a disable-scspell comment

Sometimes files contain text in different languages, which you do not want to add to a file specific dictionary.

It would be nice if there was an option like:

// disable-scspell
bonjour monde
hallo welt
//enable-scspell

(can also be other types of comments, e.g. /* disable-scspell */ )

to disable scspell for some lines of a file
A method like this exists for all linting tools e.g. eslint, psalm, phpstan,... so I think it would be useful here too

Add to dictionary from command line

First of all, nice tool, thanks for doing that.

Is there a way to add a word to dictionary from command line (without interactive mode)?

I'm using this as a linter checker. It works great, however, when I'm in Vim I can't (actually I could but I don't want to) use it in interactive mode. But I still would like to have a way to add a word to dictionary. So something like scspell --add-to-natural-dictionary myfancyword would be great.

AttributeError: 'NoneType' object has no attribute 'match'

I get the following error while running scspell on https://github.com/k4rtik/htdp/blob/master/htdp-lib/lang/private/tp-dialog.rkt

$ scspell tp-dialog.rkt
Traceback (most recent call last):
  File "/usr/local/bin/scspell", line 83, in <module>
    scspell_lib.spell_check(files, opts.override_filename, opts.report)
  File "/usr/local/lib/python2.7/site-packages/scspell_lib/__init__.py", line 617, in spell_check
    spell_check_file(f, dicts, ignores, report_only)
  File "/usr/local/lib/python2.7/site-packages/scspell_lib/__init__.py", line 504, in spell_check_file
    data, m), filename, file_id, dicts, ignores, report_only)
  File "/usr/local/lib/python2.7/site-packages/scspell_lib/__init__.py", line 448, in spell_check_token
    (not dicts.match(st, filename, file_id)) and
  File "/usr/local/lib/python2.7/site-packages/scspell_lib/_corpus.py", line 227, in match
    if self._natural_dict.match(token):
AttributeError: 'NoneType' object has no attribute 'match'

API for getting all of the spelling corrections

I'm imagining the API would provide an iterator over the file, returning results and spell_check_file would iterate over these, handling them.

This opens the door for programmatic use of scspell.

Wrong FSF address in license

The license headers and COPYING.txt included in this project have an incorrect address for the Free Software Foundation, as detected by the rpmlint utility: https://fedoraproject.org/wiki/Common_Rpmlint_issues#incorrect-fsf-address

An updated copy of the GPLv2 can be found here: https://www.gnu.org/licenses/old-licenses/gpl-2.0.txt

Running the following command changed the headers to a correct address:

find scspell -name '*.py' | xargs sed -i '/You should have received a copy of the GNU General Public License/{
  N
  /You should have received a copy of the GNU General Public License\n.*along with this program; if not, write to the Free Software/{
    N
    s/\(.*\)You should have received a copy of the GNU General Public License\n\(.*\)along with this program; if not, write to the Free Software\n\(.*\)Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA/\1You should have received a copy of the GNU General Public License along\n\2with this program; if not, write to the Free Software Foundation, Inc.,\n\351 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA./
  }
}'

Warning during pip installation

DEPRECATION: scspell3k is being installed using the legacy 'setup.py install' method, because it does not have a 'pyproject.toml' and the 'wheel' package is not installed. pip 23.1 will enforce this behaviour change. A possible replacement is to enable the '--use-pep517' option. Discussion can be found at pypa/pip#8559

Suggestion: show edit distance to closest correct word

When spellchecking a large codebase, one can get a lot of false positives due to variable names that are just not English. Looking for "rare" misspellings (#24) helps (as typos tend to be, well, unique(ish) whereas variable names tend to be reused; but another useful heuristic may be to include the edit distance (e.g. Damerau-Levenshtein) to the closest correct word -- a long word that's just one typo away from a correct word is more likely to indeed be a typo.

Do not control files names

Is there a way not to check spelling of file names?

python2 set-dictionary issue

tom@computer:~/$ scspell --set-dictionary=/home/tom/Dropbox/work/data/spelling.txt
Traceback (most recent call last):
  File "/home/tom/hacking/energysage/env/bin/scspell", line 11, in <module>
    sys.exit(main())
  File "/home/tom/hacking/energysage/env/local/lib/python2.7/site-packages/scspell/__init__.py", line 899, in main
    set_dictionary(args.dictionary)
  File "/home/tom/hacking/energysage/env/local/lib/python2.7/site-packages/scspell/__init__.py", line 648, in set_dictionary
    config.write(f)
  File "/usr/lib/python2.7/ConfigParser.py", line 414, in write
    fp.write("\n")
TypeError: write() argument 1 must be unicode, not str
tom@computer:~/$ Python 2.7.13
tom@computer:~/$ pip freeze | grep -i scspell
scspell3k==2.1

Add pre-commit support

It would be nice if scspell supported the pre-commit tool as has been done with codespell.

We just need to add a .pre-commit-hooks.yaml file at the top level. It would look something like:

-   id: scspell
    name: scspell
    description: 'spell checking'
    entry: scspell
    args: [--report-only]
    language: python
    types: [text]

The user would then need to add their source controlled dictionaries. The user .pre-commit-config.yaml would use a:

- repo: https://github.com/myint/scspell
  rev: [new version #]
  hooks:
    - id: scspell
      args: [--use-builtin-base-dict,--override-dictionary=/path/to/dictionary_file.txt]

Hopefully this configuration would make things "stateless" enough that it could be run across different machines without any issue.

scspell should support directory traversal

I think it'd be handy for scspell to have directory-recursing
abilities. By way of illustration, here's my attempt at it for one
project:

https://github.com/chapel-lang/chapel/blob/master/util/chplspell

All the complexity of that script is in service of two user
conveniences:

The script has a built in default set of directories and file globs
to search for within them, and
If any files or directories are given on the command line, they are
used as the base of the search instead of the default ones.

This way, the user can just type chplspell and get all the right
files in the tree spell-checked, with the right dictionaries and
options.

If this functionality were moved inside scspell, then this script
could be replaced by effectively a one-liner, invoking scspell with a
description of those defaults and passing through the rest of the
commandline. And then any other projects that want to use scspell in
a similar manner would only need their own similar one-liner instead
of a complex script.

I don't know what would be the best interface for that. The two
alternatives I've thought of are:

Pass everything as commandline arguments.
Pass everything through a config file.

In the first alternative, that "one-liner" would be a very long line,
like the following (more generic files and directories than in the
above script):

#!/bin/bash
exec scspell --defdir doc --defdir man --defdir src \
             --defglob "*.c" --defglob "*.h" --defglob "*.cpp" \
             --use-builtin-base-dict \
             --relative-to $PROJ_HOME \
             --override-dictionary $PROJ_HOME/.scspell/dictionary \
             "$@"

In the second alternative, it would be a simpler one-liner. The
config file would specify all the other options from the command line
above.

#!/bin/bash
exec scspell --project-config $PROJ_HOME/.scspell/$PROJ.scspell.conf "$@"

Presumably using ConfigParser.

I'm not sure how to cleanly associate certain of the globs
(e.g. .tex) with certain behavior (e.g. --no-c-escapes). I could
imagine --deftexglob ".tex", but in addition to being a little gross,
it'd get grosser if there turn out to be languages that need their own
--no-c-escapes sort of switch. (The above script gets away with
includeing README* in default globs, and *.tex in the latex globs only
because there's no README.tex in tree.)

Thoughts?

scspell does not match backslashed words

I noticed this while spell-checking in a code that uses backslashed words for Doxygen, e.g.

/**
 * This is a test that uses backslashes.
 * \author J. Doe
 * \date Today
 * \another keyword
 */

The output is, e.g.

test.h:5: Unmatched 'nother' --> {nother}
   (i)gnore, (I)gnore all, (r)eplace, (R)eplace all, (a)dd to dictionary, or
   show (c)ontext? [i]