Giter Club home page Giter Club logo

gospell's Introduction

******************************** WORK IN PROGRESS ****************************
******************************************************************************

This project attempts to automate spell checking for comments in Go code. Most
modern languages rely on comments for documentation, and I was curious to see
to what degree this can be automated. It's split into a few packages:

1) Package "check" contains generic logic for spell checking given an alphabet
   and dictionary. It works as a standalone spell checking package.
2) Package "lang" store dictionary data and currently only supports
   English(US).
3) Package "scrape" has logic to scrape data from Merriam-Webster's online
   dictionary.
4) Package "main" builds a binary to run predefined or tunable spell checkers
   against specific files or recursively on a directory. Java, C, C++, and
   Scala files are also supported, with the default being Go.

So what's considered a misspelling?

Since comments are often code expressions and not valid grammar, it's not
rational to simply check each space-delimited string against a dictionary.
The default behavior classifies a misspelled word if it:

    - has at least 5 characters
        and
            - differs by 1 character insertion
            or
            - differs by 1 character deletion
            or
            - differs by a single consecutive character swap

To run the classifier on a Go project:

    $ ./gospell .

There's minimal tuning support. To classify against words that:

    - are at least 4 characters long
    and
    - differ by at most 2 insertions:

Try:

    $ ./gospell -ml=4 -mi=2 .

This is a decent start. Results often need pruning by a human eye. It may be
worth exploring the following features:

      -- the frequency of a misspelled word wherein some threshold
         declassifies the misspelling
      -- adapt the insertion, deletion, and swap restrictions
         based on the size of the word,
         so longer words can differ by more changes.

gospell's People

Contributors

tshprecher avatar sten0 avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.