cfedermann / appraise Goto Github PK
View Code? Open in Web Editor NEWAppraise evaluation system for manual evaluation of machine translation output
Home Page: http://www.appraise.cf/
License: BSD 3-Clause "New" or "Revised" License
Appraise evaluation system for manual evaluation of machine translation output
Home Page: http://www.appraise.cf/
License: BSD 3-Clause "New" or "Revised" License
The Overview page should always show the number of completed HITs (by myself), average time per HIT etc.
However, when no HITs are available, I see only this message "At this moment, there are no HITs available to work on. Check back soon...".
Our annotators may get nervous that their work was lost.
On /appraise/admin/evaluation/evaluationtask/add/ the multiple selection box, which allows for selection of users is too small, resulting in uncertainty on how many and which users have been assigned to the task. We should think of a more user-friendly interface (i.e. another widget) given its critical functionality
So, we have just come to preparing the set that is to be used for measuring intra-annotator agreement in the ranking task, where we have to present the same 96 evaluation items a second time, but possibly not in the original sequence. Since randomized order is not supported by Appraise, we though of randomizing them in the input files. The original format specified in the Specification Document would allow for the system to derive the context from the original file, and display the items in the order specified in the task file. Unfortunately the simplification of the import data scheme breaks that requirement and randomizing them before import would mean that the context is not consistent. We would need a way to overcome this problem, i.e allowing randomization upon choosing the next-item, for particular sets.
In Firefox button Reset does nothing. Even if a user has already selected some ranks, clicking this button doesn't reset them to NILL.
This is a minor issue and doesn't affect functionality, which means that instead of fixing, the button can be removed as well
Exported CSV files (generated from the admin page) do not have a '\n' character on the last line. This creates problems when the file is concatenated with other files. Please add a trailing newline.
Appraise requires Django >=1.3 but the web app doesn't work with 1.3.1.
When pointing the browser to "http://127.0.0.1:8000/appraise/" with the server running, I get the following error:
ImportError at /appraise/
cannot import name patterns
Request Method: GET
Request URL: http://127.0.0.1:8000/appraise/
Django Version: 1.3.1
Exception Type: ImportError
Exception Value:
cannot import name patterns
Exception Location: /home/a/software/Appraise-Software/appraise/../appraise/urls.py in , line 7
Python Executable: /usr/bin/python
Python Version: 2.7.6
Python Path:
['/home/a/software/Appraise-Software/appraise',
'/usr/local/lib/python2.7/dist-packages/langid-1.1.4dev-py2.7.egg',
'/usr/lib/python2.7',
'/usr/lib/python2.7/plat-x86_64-linux-gnu',
'/usr/lib/python2.7/lib-tk',
'/usr/lib/python2.7/lib-old',
'/usr/lib/python2.7/lib-dynload',
'/usr/local/lib/python2.7/dist-packages',
'/usr/lib/python2.7/dist-packages',
'/usr/lib/python2.7/dist-packages/PILcompat',
'/usr/lib/python2.7/dist-packages/gst-0.10',
'/usr/lib/python2.7/dist-packages/gtk-2.0',
'/usr/lib/pymodules/python2.7',
'/usr/lib/python2.7/dist-packages/ubuntu-sso-client']
Server time: Wed, 27 Aug 2014 16:01:59 +0200
"Permission denied" given
How to compute the system ranking clusters if systems often produce the same output and are merged in the results CSV file? Is using the scripts/compute_ranking_clusters.perl
script the correct way?
This script seems to ignore merged systems in the results CSV file (sysA+sysB
will be treated as a separate, new system). I have fixed it in this commit in my fork. Was that the correct thing to do, or is there a better way of getting the ranking clusters?
( Without this fix, the clustering script would get stuck in an infinite loop on my data, i.e., several variants of the same NLG system, often producing identical outputs. )
I don't see which group I am part of. (Silly workaround: do 1 HIT and check which group's progress has increased.)
The name of my group could be shown:
Also in the "Language pair status", the language pairs I have selected could be highlighted/marked.
The file appraise/start-server.sh.sample
refers to a file named appraise.conf in the lighthttpd invocation. Can you add an example such file to the repository for those of us trying to set it up under that server?
Activate registration view that allows WMT13 participants to request/create user accounts.
Open issues to decide on:
Check if the django.user
has some generic views for that.
Users should be able to select tasks to work on.
The list of available tasks should be filtered as follows:
This issue was reported by one of the translator agencies participating in TaraXU's evaluation round 2 and was confirmed by Cindy Tscherwinka.
"I am now working on Task 2 and I cannot tick the ‘translate from scratch’ box if none of the sentences can be post-edited easily. I get a ‘stop’ symbol which appears over the box when I try to tick it."
Cindy posted: "I had the same question as I tested the system. At least I think she is referring to that issue: you still have to select one of the sentences that is not easy to post-edit before you can select “translate from scratch”. "
It should be possible from the Django admin backend to retire a campaign.
This should also retire any associated objects such as tasks, items, and maybe results. The corresponding campaign team should also be retired.
If input data is exactly identical (= same JSON, same IDs, same campaign) then we should not create redundant task instances. This only pollutes the database.
Instead of having various file level scripts for shell admin ops, add custom management commands and clean up things...
See Django documentation here:
Check that we don't generate empty (i.e., sequence of five PLACEHOLDER systems) CSV export lines for RankingResult instances. Check that this is only happening for skipped ranking tasks.
We have seen spaces, non-ASCII characters and symbols resulting in server errors during signup POST submission. Fix this and give better error message.
Would it be possible to filter out, or to group equal hypothesis. There is no interest in ranking equal outputs, and in fact I loose some time when having 5 hypothesis to find the differences between two of them (many times there are equals, sometimes, they only differ by one character). It would be easier if we knew all outputs to be different.
At current interface, the admin cannot see the progress of the tasks, unless they give themselves permission to the task and log in as normal user
Hello,
Can you please elaborate what license is Appraise under? I didn't see any license information or file in the repo?
Thanks
If, for some reason, Django encounters an HTTP 404 or 500 error, page template rendering of either 404.html
or 500.html
breaks the navigation links on the top of the page.
PREFIX_URL
is not handed over as a parameter and hence not prepended to the URLs.
Django documentation on customising error views is available here:
The JavaScript-based computation of durations introduced in 9b02f52 needs to be integrated into:
Clean up the repository and remove outdated:
Implement A>B
, A=B
, A<B
Ranking task for n=2
translation alternatives.
Readme states Appraise should work with Django >1.3, but it does not work with any 1.3 version, because certain symbols imported on urls.py were on django.conf.urls.default on 1.3, but on django.conf.urls on >1.4.
Also, it does not work with >1.5, because the patterns api changed. There is also an open issue with 1.6.
To my best knowledge, it only works with 1.4.20, that will be supported until October 2015.
Can you update the readme to reflect this?
Use random sample of 10 HITs per WMT14 language pair and allow infinite collection of annotation results on these...
Create DEMO group for demo users.
Should be easy, however needs some testing to make sure nothing is broken for evaluation campaign R2...
All users should be able to see:
REQUIRED_HOURS_PER_GROUP
).Use color coding to make people feel better ;)
When installing and running the collectstatic
command, it couldn't find the jquery.js
. Do I have to install other packages other than django==1.3
?
~/Appraise-Software/appraise$ python manage.py collectstatic
WARNING:appraise.utils:NLTK is NOT available, using fallback AnnotationTask class instead. This does NOT implement any NLTK features!
You have requested to collect static files at the destination
location as specified in your settings file.
This will overwrite existing files.
Are you sure you want to do this?
Type 'yes' to continue, or 'no' to cancel: yes
Copying '/home/ltan/Appraise-Software/appraise/static/admin/js/jquery.js'
Traceback (most recent call last):
File "manage.py", line 23, in <module>
execute_manager(settings)
File "/usr/local/lib/python2.7/dist-packages/django/core/management/__init__.py", line 438, in execute_manager
utility.execute()
File "/usr/local/lib/python2.7/dist-packages/django/core/management/__init__.py", line 379, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/usr/local/lib/python2.7/dist-packages/django/core/management/base.py", line 191, in run_from_argv
self.execute(*args, **options.__dict__)
File "/usr/local/lib/python2.7/dist-packages/django/core/management/base.py", line 220, in execute
output = self.handle(*args, **options)
File "/usr/local/lib/python2.7/dist-packages/django/core/management/base.py", line 351, in handle
return self.handle_noargs(**options)
File "/usr/local/lib/python2.7/dist-packages/django/contrib/staticfiles/management/commands/collectstatic.py", line 89, in handle_noargs
self.copy_file(path, prefixed_path, storage, **options)
File "/usr/local/lib/python2.7/dist-packages/django/contrib/staticfiles/management/commands/collectstatic.py", line 199, in copy_file
shutil.copy2(source_path, full_path)
File "/usr/lib/python2.7/shutil.py", line 130, in copy2
copyfile(src, dst)
File "/usr/lib/python2.7/shutil.py", line 83, in copyfile
with open(dst, 'wb') as fdst:
IOError: [Errno 2] No such file or directory: u'/static-files/admin/js/jquery.js'
It's just been noticed that when a big word appears on the left column of the error classification pane, some of the radio buttons get "wrapped" to the next line, although there is a lot of space around the "summary box" where the left column could expand. To reproduce it, just narrow your browser window. You will see that the right column reserves empty space around the summary box whereas the left column gets suppressed/wrapped.
Many times, the systems outputs for a sentence are identical. Rather than constructing each task from a random subset of systems, each task should be constructed from the set of distinct outputs for that sentence. The pairwise rankings could then be re-associated with the systems to generate a larger set of pairwise rankings.
This would be a bit more respectful of people's times (it's annoying to see identical outputs), and would also let us potentially gather data more quickly. On the WMT14 data, for example, there are identical system outputs on over half the sentences.
CC: @cfedermann
tag 'appraise-results' is ended by tag 'appraise-result' (missing 's')
Copy admin files to static folder for deployment without DEBUG=True.
It seems I forget my password...
After downloading and running Appraise it seems that the current code is intended to work only for WMT15 evaluation. I would like to create my own evaluation tasks but the corresponding link does not exist. How can I use the current version of Appraise for preparing my own evaluations?
Extend user accounts (or associated profiles) with information about a user's affiliation.
OR in case that does not work, create respective groups inside Django's admin backend and thus allow to assign users to "affiliation groups".
The latter option would work out-of-the-box.
Add code to properly randomize SECREY_KEY
inside settings.py
as likely none of the users would otherwise do it.
In ascertaining whether enough HITs have been collected, it would be really helpful if the Language pair status page listed the number of systems, e.g.,
English → Czech (X systems) 519 remaining 1481 completed
Add updated dictionary with current HIT completion requirements for WMT15 participants.
There is a validation error every time I press submit on the Validation task. It is reproducible for every existing ranking task.
These are two minor (rather template) issues observed in Firefox 8. They do not affect functionality
Have a look here: http://www.dfki.de/~elav01/tmp/appraise-screenshot.png
Add an additional "language pairs" attribute to user profiles.
This does require the addition of customised user profiles;
see the Django documentation over here:
Seems this has improved greatly in Django v1.5:
The web application crashes with Django 1.5. This is the error shown in the browser:
NoReverseMatch at /appraise/
'url' requires a non-empty first argument. The syntax changed in Django 1.5, see the docs.
Request Method: GET
Request URL: http://127.0.0.1:8000/appraise/
Django Version: 1.5
Exception Type: NoReverseMatch
Exception Value:
'url' requires a non-empty first argument. The syntax changed in Django 1.5, see the docs.
Exception Location: /usr/local/lib/python2.7/dist-packages/django/template/defaulttags.py in render, line 402
Python Executable: /usr/bin/python
Python Version: 2.7.6
Python Path:
['/home/a/software/Appraise-Software/appraise',
'/usr/local/lib/python2.7/dist-packages/langid-1.1.4dev-py2.7.egg',
'/usr/lib/python2.7',
'/usr/lib/python2.7/plat-x86_64-linux-gnu',
'/usr/lib/python2.7/lib-tk',
'/usr/lib/python2.7/lib-old',
'/usr/lib/python2.7/lib-dynload',
'/usr/local/lib/python2.7/dist-packages',
'/usr/lib/python2.7/dist-packages',
'/usr/lib/python2.7/dist-packages/PILcompat',
'/usr/lib/python2.7/dist-packages/gst-0.10',
'/usr/lib/python2.7/dist-packages/gtk-2.0',
'/usr/lib/pymodules/python2.7',
'/usr/lib/python2.7/dist-packages/ubuntu-sso-client']
Server time: Wed, 27 Aug 2014 16:09:03 +0200
Add a button that allows to reset error classes for a single word only.
Matt pointed out these points:
(A nice way would be a gradient from green to red aligned with the buttons)
Appraise crashes with Django 1.6.1 (the version of Django included in Ubuntu 14.04).
When running "python manage.py syncdb", I get the following error:
Traceback (most recent call last):
File "manage.py", line 7, in
from django.core.management import execute_manager
ImportError: cannot import name execute_manager
My average time of 4' and 4' between two language pairs is 8' instead of 4'
It would be nice to have a URL I could query to automate a RankingResult CSV export. e.g.,
wget -O backup.csv http://appraise.cf/admin/wmt14/rankingresult/
I think security-through-obscurity would be fine here, but requiring a secret token or something via a GET string would be fine.
Compute inter-annotator agreement (IAA) for the number of annotators or coders (C) which maximises the number of items (I) that have been evaluated by the respective sub set of coders. This avoids showing no IAA scores until all coders have completed the task.
Additionally, there could be checkboxes which can be used to toggle whether a system should be included in IAA computation or not. That way, different scenarios could be tested in a more playful way, from within the status view.
If implemented, the checkbox selection should also be reflected when downloading results data.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.