savkov / bratutils Goto Github PK
View Code? Open in Web Editor NEWA collection of utilities for manipulating data and calculating inter-annotator agreement in brat annotation files.
License: MIT License
A collection of utilities for manipulating data and calculating inter-annotator agreement in brat annotation files.
License: MIT License
Merge document utilities seem to be in disarray. Would be nice to update them and write some proper tests for them.
Support for relations has been long asked for but I've been reluctant to implement it because the code is not my best and am reluctant to go back into the heavy logic. However, I just worked on getting the parsing function to handle gracefully all types and it looks like relations can be implemented in a way that is self-contained and probably quite straightforward. So I'll lay out what I want to do here and ask for feedback.
Relations are effectively triples of two arguments and a relation type. Assuming that the possible arguments are predetermined, e.g. arguments can only be tokens, or chunks or some other pre-annotated spans, evaluating the agreement is really quite easy -- F1-score where each triple is treated as a unique annotation. I can probably copy lost of the code straight from bioeval.
I haven't thought about this for too long but using F1-score seems to be a bit of a copout here. The probability of a random assignment of a relation is not infinitely small. So maybe kappa can be implemented here instead.
Additionally, in many cases the arguments are not necessarily predetermined, so that would be quite hard to evaluate at the same time and honestly I have no idea how to do it ATM.
So I'm looking for some input here. Would be nice to hear what you think.
I'm guessing relations never got added as I still receive errors. Has anyone come up with a simple fix to ignore relations so it still runs?
I have never used the attributes annotation and am not sure of the tasks that they are part of. It would be nice if someone takes the lead in designing this. I would be willing to help with integrating it into the project.
As reported in #14 relations and attributes crash the parsing function. This should be easily fixed as the problems seem to be the way the parsing is done -- not generic enough. It also looks like a good place to start in supporting relations and attributes in agreement.
Hi,
Was very happy to find this code I was looking for something to compare Brat annotations across files.
Did you ever look at implementing relations?
Hello. I need to compare automatic annotations performed by a software application with manual annotations (in brat standoff format), and this seems to be a nice tool to use.
While testing it and trying to understand the source code, I tried the following small sample code
import agreement as a
doc = a.Document("myfile.ann")
doc2 = a.Document("myfile.ann")
doc.make_gold()
statistics = doc2.compare_to_gold(doc)
However, on the execution of compare_to_gold
function, it says that Document instance has no attribute 'postag_list'
, which is true, but I don't understand where this comes from either.
Am I missing something? Could you eventually post a small working example for comparing two .ann
files? I'd appreciate that.
Thanks.
I would like to use this in a corpus annotation project that uses discontinuous annotations, but I receive the following error.
Traceback (most recent call last):
File "vso-inter-annotator.py", line 5, in
doc = a.DocumentCollection('data/BoireAnnotations/VSO_Hypertension1/')
File "build/bdist.macosx-10.10-intel/egg/bratutils/agreement.py", line 834, in init
File "build/bdist.macosx-10.10-intel/egg/bratutils/agreement.py", line 654, in init
File "build/bdist.macosx-10.10-intel/egg/bratutils/agreement.py", line 292, in init
File "build/bdist.macosx-10.10-intel/egg/bratutils/agreement.py", line 301, in _parse_annotation
ValueError: invalid literal for int() with base 10: '6419;6435'
vso-inter-annotator.py contains the following:
#VSO inter-rater agreement using BRAT utils
from bratutils import agreement as a
doc = a.DocumentCollection('data/BoireAnnotations/VSO_Hypertension1/')
doc2 = a.DocumentCollection('data/HerringAnnotations/VSO_Hypertension1/')
doc.make_gold()
statistics = doc2.compare_to_gold(doc)
print statistics
Here is the annotation file that is causing the error.
T1 VSO_0000005 3395 3407 182/107 mmHg
T2 VSO_0000005 4300 4312 200/100 mmHg
T3 VSO_0000008 4254 4260 36.8°C
T4 VSO_0000005 6518 6529 160/80 mmHg
T5 VSO_0000005 15833 15844 170/80 mmHg
T6 VSO_0000038 16385 16408 Systolic blood pressure
T7 VSO_0000005 16438 16446 200 mmHg
T8 VSO_0000005 16867 16878 135/95 mmHg
T9 VSO_0000005 16959 16971 160/100 mmHg
T10 VSO_0000005 17659 17671 220/120 mmHg
T11 VSO_0000005 18143 18154 135/95 mmHg
T12 VSO_0000004 3370 3384 blood pressure
T13 VSO_0000007 4239 4250 temperature
T14 VSO_0000004 4282 4296 blood pressure
T15 VSO_0000004 6486 6500 Blood pressure
T16 VSO_0000004 15802 15816 Blood pressure
T17 VSO_0000004 16826 16840 Blood pressure
T18 VSO_0000004 16941 16955 blood pressure
T19 VSO_0000004 17624 17638 Blood pressure
T20 GO_0008217 17713 17738 Blood pressure normalized
T21 VSO_0000004 18125 18139 blood pressure
T23 VSO_0000030 4341 4360 63 beats per minute
T24 GO_0008217 6405 6419;6435 6442 blood pressure control
T31 GO_0008217 16046 16060;16072 16079 blood pressure control
T33 VSO_0000006 16826 16844;16855 16863 Blood pressure was measured
T34 GO_0008217 17015 17029;17041 17048 blood pressure control
T38 VSO_0000029 4314 4324 Heart rate
T39 VSO_0000004 6147 6161 blood pressure
T41 GO_0008217 6486 6514 Blood pressure was decreased
T43 VSO_0000006 15802 15829 Blood pressure was measured
T22 VSO_0000004 6405 6419 blood pressure
T25 VSO_0000004 16046 16060 blood pressure
T26 VSO_0000004 17015 17029 blood pressure
T27 VSO_0000004 17713 17727 Blood pressure
T28 VSO_0000004 18517 18531 blood pressure
Thank you
bratutils/src/bratutils/agreement.py
Line 444 in 0b3ba5c
many thanks for your work. saved me days.
Hi Savkov,
Knowing this tool before would have saved me lot of time. I used NLTK package to measure IAA of brat annotation files. A bit a nightmare to convert the "ann" file to something readable. So, I think that this tool is very useful, and the code is great, congratulations!
Our problem is that we have data structured in this way:
T1 Food 24 31 bacalao
T2 Restaurant 0 8 Un sitio
T3 Restaurant 46 54 Un lugar
T5 Restaurant 55 66 con encanto
A3 Polarity T5 POS
A4 Restaurant_Aspects T5 General_experience
R2 refers_to Arg1:T5 Arg2:T3
T4 Food 34 43 riquísimo
A1 Polarity T4 POS
A2 Food_Aspects T4 General_experience
R1 refers_to Arg1:T4 Arg2:T1
And we want to measure agreement for the 3 categories, entities (Food#Bacalao), attribute ( aspect -->General_experience#con encanto; and polarity --> POS), and also relationships (R1 refers_to...). Are you planning to implement these options too? It would be really useful for annotation at aspect-based Sentiment Analysis.
Many thanks
Hello again,
I have two .ann files.
The gold
T1 Medical-Concept 36 41 tumor
T2 Medical-Concept 327 351 síndrome mielodisplásica
T3 Medical-Concept 440 445 tumor
T4 Medical-Concept 22 32 morfologia
T5 Medical-Concept 79 117 Nomenclatura Sistematizada de Medicina
T6 Medical-Concept 120 126 SNOMED
T7 Medical-Concept 189 204 Linfoma maligno
T8 Medical-Concept 207 216 folicular
T9 Medical-Concept 220 227 nodular
T10 Medical-Concept 270 310 Anemia refratária com excesso de blastos
T11 Medical-Concept 356 366 deleção 5q
T12 Medical-Concept 368 371 5q-
And the candidate set
T1 Medical-Concept 270 287 Anemia refratária
T2 Medical-Concept 327 335 Síndrome
T3 Medical-Concept 471 476 seção
For the comparison I'm running the following code
from bratutils import agreement as a
__author__ = 'Aleksandar Savkov'
doc = '3711'
gold = a.Document('../res/ht_gold/' + doc + '.ann')
extension = a.Document('../res/ht_extension/' + doc + '.ann')
gold.make_gold()
statistics = extension.compare_to_gold(gold)
print statistics
This should produce as result: 0 correct, 12 missing and 3 spurious tags. Right?
The produced result is 3 missing tags and 0 correct/partial/spurious. I think the spurious tags are not being correctly handled.
Is my thinking right, or this is actually the desired output?
Hugo
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.