Comments (4)
Hi. Please do send your example data. We'll have a quick look.
On Oct 8, 2014 8:19 AM, "slbayer" [email protected] wrote:
I have a minimal case that seems to break the scorer (and I confess, I'm
using the scorer for an entity clustering and linking evaluation which
isn't TAC KBP).Let's say your corpus contains two documents. Document 1 contains a gold
mention, and document 2 contains no gold mentions; for the evaluated
system, document 1 contains no mentions and document 2 contains one
mention. This minimal case causes the scorer to fail as follows:INFO Converting gold to evaluation format..
INFO Converting systems to evaluation format..
INFO Evaluating systems..
neleval/evaluate.py:173: StrictMetricWarning: Strict P/R defaulting to
zero score for zero denominator
StrictMetricWarning)
INFO Preparing summary report..
INFO Calculating confidence intervals..
neleval/evaluate.py:173: StrictMetricWarning: Strict P/R defaulting to
zero score for zero denominator
StrictMetricWarning)
INFO preparing strong_link_match report..
INFO preparing strong_nil_match report..
INFO preparing strong_all_match report..
INFO preparing strong_typed_link_match report..
INFO Preparing error report..
Traceback (most recent call last):
File
"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py",
line 162, in
_run_module_as_main "main", fname, loader, pkg_name)
File
"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py",
line 72, in_run_code exec code in run_globals File
"/Volumes/Blinken/Projects/NEET-CO/TAC_2014_scoring/neleval/neleval/main.py",
line 60, in
main()
File "/Volumes/Blinken/Projects/NEET-CO/TAC_2014_scoring/neleval/neleval/
main.py", line 57, in main
print(obj())
File "neleval/analyze.py", line 75, in call
counts = Counter(error.label for error in
_data()) File
"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/collections.py",
line 444, in init
self.update(iterable, **kwds)
File
"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/collections.py",
line 525, in update
for elem in iterable:
File "neleval/analyze.py", line 75, in
counts = Counter(error.label for error in _data())
File "neleval/analyze.py", line 86, in iter_errors
assert g.id == s.id
AssertionErrorI'd be happy to send you the minimal test I've set up, if you need it. I'd
try to fix it myself, but I'm hoping that you'll be faster :-).I'm working with the latest clone of the the repository. MacOS 10.9.5,
Python 2.7.5, numpy-1.9.0, scipy-0.14.0, joblib, 0.8.3-r1, nose 1.3.4.β
Reply to this email directly or view it on GitHub
#5.
from neleval.
I've sent a zip file to your Gmail address.
There are two problems here.
One problem appears to be that you're assuming in iter_errors that Reader() is being used with group=by_document. The logic you have for iter_errors makes no sense if group=by_mention, because zip() isn't guaranteed to pair them correctly, at all. More to the point, I can't figure out why you'd even use by_mention here; it's the only place this grouping is used, and the logic doesn't seem to justify using it.
The other problem is that even when you remove by_mention, and group by document, you get an error if documents don't contain any annotations.
No matter how you do this, zip isn't going to cut it for you; you're going to have to collect all the indexes, generate MISSING for the documents/indexes that don't match at all, and THEN process the remaining pairs.
from neleval.
This might have been something not properly fixed up when we moved it from handling AIDA-CoNLL-style data to TAC EDL-style data; analyze
was ported in a rush. A setwise comparison should be straightforward. Thanks for the code review!
from neleval.
Fixed in 46d9233.
Thanks again, @slbayer. Please let us know if you notice anything else.
from neleval.
Related Issues (20)
- prepare-tac should merge multiple candidates at same location HOT 4
- Support evaluating with incomplete gold standards
- Within-document evaluation mode for cross-doc coreference evaluation
- Feature Request: Specifying multiple metrics for file HOT 1
- Citation? HOT 1
- I am a KBP
- The results obtained with this procedure are not the same as the official results of TAC HOT 2
- The order of lines in evaluate output can change HOT 1
- Entity CEAF true positives and false positives off by factor of 2. HOT 3
- i can't run the shell scripts δΈθ¦
- i can't run the shell scripts HOT 1
- Consider adding LEA to coref evaluation metrics? HOT 1
- Rewrite neleval.summary to use dataframes to store previous results
- Duplicate NIL cases result in wrong results
- use streams rather than strings to output from __call__
- improve test coverage
- SPHINXOPTS=-W in travis doesn't appear to be working
- Reimplement coref metrics in terms of contingency matrix and missing/spurious weights
- prepare-conll-coref does not convert AIDA-YAGO2-dataset HOT 5
- Add data validation
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from neleval.