Comments (4)
Thanks for taking a close look under the hood! You're right that the variable names don't match. But I believe the values used in the function (and therefore the net effect) do match.
parser.add_argument('--text_threshold', default=0.7, type=float, help='text confidence threshold')
parser.add_argument('--low_text', default=0.4, type=float, help='text low-bound score')
parser.add_argument('--link_threshold', default=0.4, type=float, help='link confidence threshold')
[source]
The variable names are hard for me to keep in my head so I made a table to summarize the differences.
purpose | keras-ocr variable name | keras-ocr variable value | CRAFT-pytorch variable name | CRAFT-pytorch |
---|---|---|---|---|
threshold the text map | text_threshold | 0.4 | low_text | 0.4 |
threshold the link map | link_threshold | 0.4 | link_threshold | 0.4 |
filter out detections | detection_threshold | 0.7 | text_threshold | 0.7 |
That still leaves us with the question of whether we should change the variable names to match the original implementation. In my humble opinion, the new names are more semantically descriptive. text_threshold
and link_threshold
are used in similar ways, so I think it makes sense for their variable names to have similar structure. This is in contrast with the low_text
/ link_threshold
naming which, to me, implies that these values are used in different ways. Could you share your thoughts on that?
I'd also appreciate you checking to see if I've made a mistake above -- this can all be a little confusing and I may have misread something.
from keras-ocr.
You're right, your values do match the original implementation, so there's no impact on post-processing. It was, however, a little confusing when I adjusted the values and wasn't seeing the effects I expected.
I do agree with you about not changing the variable names to match the original ones. Yours are more descriptive, particularly with detection_threshold
, which is used with the connected-component labeling that actually does the "detecting". I also like how you added the size_threshold
, which was originally just hardcoded.
As such, I think a little more documentation that highlights the meaning and usage of these params is all you need. A docstring in the source would probably suffice, as most users likely won't want/need/know to alter these for their use case. This discussion in the original repo does a pretty good job at describing each's purpose in plain English.
from keras-ocr.
I think updating the docstring is a fine idea. Unfortunately, the description in the linked comment defines the variables in terms of their perceived effect as opposed to how they are actually used. Below are what I believe are more accurate definitions that discuss how the values are used in addition to their effect. I'm interested in your feedback on the definitions. If you agree, I'll add them to the docstring for detector.detect
. Or, since it was your idea, you're welcome to file a PR. I'd very much like you to get credit!
text_threshold
: When the text map is processed, it is converted from confidence (float from zero to one) values to classification (0 for not text, 1 for text) using binary thresholding. Thetext_threshold
value determines the breakpoint at which a value is converted to a 1 or a 0. For example, iftext_threshold
is 0.4 and a value for a particular point on the text map is 0.5, that value gets converted to a 1. The higher this value is, the less likely it is that characters will be merged together into a single word. The lower this value is, the more likely it is that non-text will be detected. Therein lies the balance.link_threshold
: This is the same astext_threshold
, but is applied to the link map instead of the text map.detection_threshold
: We want to avoid including boxes that may have represented large regions of low confidence text predictions. To do this, we do a final check for each word box to make sure the maximum confidence value exceeds some detection threshold. This is the threshold used for this check.
from keras-ocr.
I've added this documentation in 717bfcd. Thanks for raising these questions!
from keras-ocr.
Related Issues (20)
- "Tried to convert 'num' to a tensor and failed. Error: None values not supported." HOT 1
- Can I get Korean Text from Image? Using keras-ocr HOT 1
- Open Source License HOT 1
- Adding an example for fine-tuning both detector & recognizer using an your own dataset HOT 4
- Detecting vertical text with craft HOT 3
- Can I extract the text color too?
- Error while import package
- How can I load the models in an offline environment? HOT 1
- Finetuning the recognizer crashes when reaching the fit_generator method
- README.md has 3 image links for running OCR. Second image is not available.
- Text bbox transform
- Train the recognizer
- Filling up RAM
- unable to load fonts. There is some issue not loading fonts while end-to-end training. HOT 1
- Small Issue With Letter Recognition
- is there a way to skip download data_generation.get_backgrounds and data_generation.get_fonts
- tensorflow is missing from requirements
- Readme.md issue
- Pipeline constructor initializing libiomp5 multiple times
- Cannot Download Pipeline: Unrecognized keyword arguments passed to Dense HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from keras-ocr.