Comments (12)
It would be fine with me to say that only a subset of corrections can be automatically applied, so long as the others were still reported. For example adding a feature for within-CamelCase spelling errors, it seems reasonable to specify in the docs that these can be reported but not (at the moment) automatically corrected.
from codespell.
This would be amazing but I'm worried it would report a ton of false positives.
from codespell.
Maybe a switch would help avoid that?
from codespell.
I have implemented this feature locally a few weeks ago and had no problems with false positives.
The reason why I haven't pushed this is because the in-file replace functionality doesn't work with this quick hack. It's the same situation as for PR #174 which doesn't support the '-w' flag currently as well.
I think at some point we have to discuss how important the write-changes option is and how it influences new features like support for CamelCase, c-escapes or customized regular expressions to split the words. Since I use codespell only for reporting misspellings (it runs on every compile of my projects) I do not care much about this option.
Thoughts?
from codespell.
I agree with @larsoner
from codespell.
See also discussion in #314 .
from codespell.
@thdot where is your CamelCase read-only checking code? I'd love to get it merged in!
from codespell.
An example regex to extract individual words from camelCase and mixedACRONYMSpelling that may be useful (PCRE, though) /[a-z]+|[A-Z](?:[a-z]+|[A-Z]+(?![a-z])
(not aware of hyphens or accented characters, could skip that check if either were found)
This was from a project to check spelling and be aware of PHP's syntax (e.g. single-quoted strings)
https://github.com/TysonAndre/PhanTypoCheck/blob/0.0.3/src/TypoCheckUtils.php#L100-L102
from codespell.
An example regex to extract individual words from camelCase and mixedACRONYMSpelling that may be useful (PCRE, though)
/[a-z]+|[A-Z](?:[a-z]+|[A-Z]+(?![a-z])
(not aware of hyphens or accented characters, could skip that check if either were found)
I'd just like to add that my initial solution had quadratic performance for long strings of lowercase letters, and should not be used as-is. (it didn't specify a start boundary)
'/(?:[a-z][^a-zA-Z]*[A-Z]|_)/'
can be used as a sanity check of whether a word is camelCase or snake_case (only needed if splitting a string is slow).
'/[a-z]+|[A-Z](?:[a-z]+|[A-Z]+(?![a-z]))/'
can be used to extract individual parts
from codespell.
Thought I might note that we use codespell in our CI for github.com/rook/rook, and we have a function AtLeast()
which codespell suggests be at least
, and I'd love to have the CamelCase support so we can still find accidental atleast
strings in documents while still allowing AtLeast()
functions.
from codespell.
I just want to mention that I would be perfectly happy with detection only quadratic camel case support (I found defaultNameOccurances
only by accident as occurance
was existing also as standalone typo)
from codespell.
I mean the one that I posted fixed the quadratic runtime I had with my first attempt, so maintainers should avoid introducing similar bugs in however they solve it.
The buggy one was /(?:[a-z].*[A-Z]|_)/
and was slow for long lowercase strings due to causing backtracking and needing to start from every possible start position
The fixed one was /[a-z]+|[A-Z](?:[a-z]+|[A-Z]+(?![a-z])/
from codespell.
Related Issues (20)
- Codespell behaves differently in pre-commit when referencing files to exclude HOT 1
- Allow defining pyproject.toml argument check-filenames as a boolean HOT 3
- The value of "--interactive" is not restricted
- Question: how can I create a space seperated word in a dictionary? HOT 1
- I can't get ignore-words to work HOT 1
- remove `crate -> create` HOT 3
- Add GNUAspell/benchโs common-all/batch0.tab dictionary
- Redundant sort test HOT 3
- `setuptools_scm` dependency HOT 2
- Ignore regular expressions HOT 6
- question: how to print filename of the file being processed?
- Support namespaced dictionary packages HOT 1
- "Goverment" not regonized in all relevant lines HOT 2
- support ignore for blocks of code HOT 1
- Codespell Not Ignoring Words Properly when Provided in Original Case HOT 1
- release a new version ? HOT 15
- Unable to add inline comments for markdown documents HOT 7
- Splitting multiple 'skip' paths onto indented config file lines behaves differently than when on a single line HOT 4
- Are inline ignores restricted to hash-style comments? HOT 1
- Is it possible to combine a custom dictionary with a builtin dictionary? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from codespell.