cfedermann / appraise Goto Github PK

View Code? Open in Web Editor NEW

73.0 73.0 37.0 198.24 MB

Appraise evaluation system for manual evaluation of machine translation output

Home Page: http://www.appraise.cf/

License: BSD 3-Clause "New" or "Revised" License

Python 61.63% Shell 1.40% HTML 19.21% Perl 2.73% JavaScript 9.61% CSS 5.22% PowerShell 0.19%

appraise's People

Contributors

Stargazers

Watchers

appraise's Issues

Overview info missing when no HITs available

The Overview page should always show the number of completed HITs (by myself), average time per HIT etc.
However, when no HITs are available, I see only this message "At this moment, there are no HITs available to work on. Check back soon...".

Our annotators may get nervous that their work was lost.

Improve interface for user assignment

On /appraise/admin/evaluation/evaluationtask/add/ the multiple selection box, which allows for selection of users is too small, resulting in uncertainty on how many and which users have been assigned to the task. We should think of a more user-friendly interface (i.e. another widget) given its critical functionality

Randomized appearance of entries with regard to context

So, we have just come to preparing the set that is to be used for measuring intra-annotator agreement in the ranking task, where we have to present the same 96 evaluation items a second time, but possibly not in the original sequence. Since randomized order is not supported by Appraise, we though of randomizing them in the input files. The original format specified in the Specification Document would allow for the system to derive the context from the original file, and display the items in the order specified in the task file. Unfortunately the simplification of the import data scheme breaks that requirement and randomizing them before import would mean that the context is not consistent. We would need a way to overcome this problem, i.e allowing randomization upon choosing the next-item, for particular sets.

"Reset" doesn't work in ranking

In Firefox button Reset does nothing. Even if a user has already selected some ranks, clicking this button doesn't reset them to NILL.

This is a minor issue and doesn't affect functionality, which means that instead of fixing, the button can be removed as well

Exported CSV has no trailing newline

Exported CSV files (generated from the admin page) do not have a '\n' character on the last line. This creates problems when the file is concatenated with other files. Please add a trailing newline.

Import error in web application with Django 1.3.1

Appraise requires Django >=1.3 but the web app doesn't work with 1.3.1.

When pointing the browser to "http://127.0.0.1:8000/appraise/" with the server running, I get the following error:

ImportError at /appraise/

cannot import name patterns

Request Method: GET
Request URL: http://127.0.0.1:8000/appraise/
Django Version: 1.3.1
Exception Type: ImportError
Exception Value:

cannot import name patterns

Exception Location: /home/a/software/Appraise-Software/appraise/../appraise/urls.py in , line 7
Python Executable: /usr/bin/python
Python Version: 2.7.6
Python Path:

['/home/a/software/Appraise-Software/appraise',
'/usr/local/lib/python2.7/dist-packages/langid-1.1.4dev-py2.7.egg',
'/usr/lib/python2.7',
'/usr/lib/python2.7/plat-x86_64-linux-gnu',
'/usr/lib/python2.7/lib-tk',
'/usr/lib/python2.7/lib-old',
'/usr/lib/python2.7/lib-dynload',
'/usr/local/lib/python2.7/dist-packages',
'/usr/lib/python2.7/dist-packages',
'/usr/lib/python2.7/dist-packages/PILcompat',
'/usr/lib/python2.7/dist-packages/gst-0.10',
'/usr/lib/python2.7/dist-packages/gtk-2.0',
'/usr/lib/pymodules/python2.7',
'/usr/lib/python2.7/dist-packages/ubuntu-sso-client']

Server time: Wed, 27 Aug 2014 16:01:59 +0200

"taraxue" user doesn't have permission to delete EvaluationTask object

"Permission denied" given

Computing clusters with systems with equal output

How to compute the system ranking clusters if systems often produce the same output and are merged in the results CSV file? Is using the scripts/compute_ranking_clusters.perl script the correct way?

This script seems to ignore merged systems in the results CSV file (sysA+sysB will be treated as a separate, new system). I have fixed it in this commit in my fork. Was that the correct thing to do, or is there a better way of getting the ranking clusters?

( Without this fix, the clustering script would get stuck in an infinite loop on my data, i.e., several variants of the same NLG system, often producing identical outputs. )

Show the group name of the user

I don't see which group I am part of. (Silly workaround: do 1 HIT and check which group's progress has increased.)
The name of my group could be shown:

next to the user name in the top menu bar
or in the "Update profile" (here it could be changed, if changing a group is allowed)
and/or in the "Group status", my group could be highlighted (or marked explicitly as "my group").

Also in the "Language pair status", the language pairs I have selected could be highlighted/marked.

example appraise.conf?

The file appraise/start-server.sh.sample refers to a file named appraise.conf in the lighthttpd invocation. Can you add an example such file to the repository for those of us trying to set it up under that server?

Add registration template/view

Activate registration view that allows WMT13 participants to request/create user accounts.

Open issues to decide on:

allow full registration?
only allow requesting an account?

Check if the django.user has some generic views for that.

Implement task/batch bidding process

Users should be able to select tasks to work on.

The list of available tasks should be filtered as follows:

a maximum of three users is allowed to work on a single task;
after selecting a task, the user either has to finish or retract from it;
the task list should also display language pairs;
only language pairs that are "valid" for the current user should be shown.

Task 2: "I cannot tick the ‘translate from scratch’ box"

This issue was reported by one of the translator agencies participating in TaraXU's evaluation round 2 and was confirmed by Cindy Tscherwinka.

"I am now working on Task 2 and I cannot tick the ‘translate from scratch’ box if none of the sentences can be post-edited easily. I get a ‘stop’ symbol which appears over the box when I try to tick it."

Cindy posted: "I had the same question as I tested the system. At least I think she is referring to that issue: you still have to select one of the sentences that is not easy to post-edit before you can select “translate from scratch”. "

Add admin action to retire campaign

It should be possible from the Django admin backend to retire a campaign.

This should also retire any associated objects such as tasks, items, and maybe results. The corresponding campaign team should also be retired.

Avoid creation of duplicate task instances

If input data is exactly identical (= same JSON, same IDs, same campaign) then we should not create redundant task instances. This only pollutes the database.

Move export/import/repair WMT14 scripts to WMT14 app actions

Instead of having various file level scripts for shell admin ops, add custom management commands and clean up things...

See Django documentation here:

https://docs.djangoproject.com/en/dev/howto/custom-management-commands/

Fix CSV export for RankingResult instances

Check that we don't generate empty (i.e., sequence of five PLACEHOLDER systems) CSV export lines for RankingResult instances. Check that this is only happening for skipped ranking tasks.

Add sanity checking for username during signup

We have seen spaces, non-ASCII characters and symbols resulting in server errors during signup POST submission. Fix this and give better error message.

Filtering equal hypothesis

Would it be possible to filter out, or to group equal hypothesis. There is no interest in ranking equal outputs, and in fact I loose some time when having 5 hypothesis to find the differences between two of them (many times there are equals, sometimes, they only differ by one character). It would be easier if we knew all outputs to be different.

Allow administrator to see progress of tasks

At current interface, the admin cannot see the progress of the tasks, unless they give themselves permission to the task and log in as normal user

License information

Hello,

Can you please elaborate what license is Appraise under? I didn't see any license information or file in the repo?

Thanks

Fix HTTP 404 and 500 error handling

If, for some reason, Django encounters an HTTP 404 or 500 error, page template rendering of either 404.html or 500.html breaks the navigation links on the top of the page.

PREFIX_URL is not handed over as a parameter and hence not prepended to the URLs.

Django documentation on customising error views is available here:

https://docs.djangoproject.com/en/dev/topics/http/views/#customizing-error-views

Roll out JavaScript-based computation of durations for other tasks

The JavaScript-based computation of durations introduced in 9b02f52 needs to be integrated into:

3-way ranking, and
quality checking tasks.

Remove outdated code/templates

Clean up the repository and remove outdated:

export/import scripts
code
page templates

Implement 3-way Ranking

Implement A>B, A=B, A<B Ranking task for n=2 translation alternatives.

Incorrect Django requirements on readme

Readme states Appraise should work with Django >1.3, but it does not work with any 1.3 version, because certain symbols imported on urls.py were on django.conf.urls.default on 1.3, but on django.conf.urls on >1.4.

Also, it does not work with >1.5, because the patterns api changed. There is also an open issue with 1.6.

To my best knowledge, it only works with 1.4.20, that will be supported until October 2015.

Can you update the readme to reflect this?

Create demo app for Appraise

Use random sample of 10 HITs per WMT14 language pair and allow infinite collection of annotation results on these...

Create DEMO group for demo users.

Update bootstrap to version 2.0

Should be easy, however needs some testing to make sure nothing is broken for evaluation campaign R2...

Add group completion status to status view

All users should be able to see:

their individual status for all tasks they started working on;
their incomplete status for all tasks they retracted from;
the aggregated status for all members of the affiliation group (percentage of REQUIRED_HOURS_PER_GROUP).

Use color coding to make people feel better ;)

Collectstatic can't find jquery.js

When installing and running the collectstatic command, it couldn't find the jquery.js. Do I have to install other packages other than django==1.3?

~/Appraise-Software/appraise$ python manage.py collectstatic
WARNING:appraise.utils:NLTK is NOT available, using fallback AnnotationTask class instead. This does NOT implement any NLTK features!

You have requested to collect static files at the destination
location as specified in your settings file.

This will overwrite existing files.
Are you sure you want to do this?

Type 'yes' to continue, or 'no' to cancel: yes
Copying '/home/ltan/Appraise-Software/appraise/static/admin/js/jquery.js'
Traceback (most recent call last):
  File "manage.py", line 23, in <module>
    execute_manager(settings)
  File "/usr/local/lib/python2.7/dist-packages/django/core/management/__init__.py", line 438, in execute_manager
    utility.execute()
  File "/usr/local/lib/python2.7/dist-packages/django/core/management/__init__.py", line 379, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/usr/local/lib/python2.7/dist-packages/django/core/management/base.py", line 191, in run_from_argv
    self.execute(*args, **options.__dict__)
  File "/usr/local/lib/python2.7/dist-packages/django/core/management/base.py", line 220, in execute
    output = self.handle(*args, **options)
  File "/usr/local/lib/python2.7/dist-packages/django/core/management/base.py", line 351, in handle
    return self.handle_noargs(**options)
  File "/usr/local/lib/python2.7/dist-packages/django/contrib/staticfiles/management/commands/collectstatic.py", line 89, in handle_noargs
    self.copy_file(path, prefixed_path, storage, **options)
  File "/usr/local/lib/python2.7/dist-packages/django/contrib/staticfiles/management/commands/collectstatic.py", line 199, in copy_file
    shutil.copy2(source_path, full_path)
  File "/usr/lib/python2.7/shutil.py", line 130, in copy2
    copyfile(src, dst)
  File "/usr/lib/python2.7/shutil.py", line 83, in copyfile
    with open(dst, 'wb') as fdst:
IOError: [Errno 2] No such file or directory: u'/static-files/admin/js/jquery.js'

Error classification template: right column uses too much space

It's just been noticed that when a big word appears on the left column of the error classification pane, some of the radio buttons get "wrapped" to the next line, although there is a lot of space around the "summary box" where the left column could expand. To reproduce it, just narrow your browser window. You will see that the right column reserves empty space around the summary box whereas the left column gets suppressed/wrapped.

Compare unique items not system outputs

Many times, the systems outputs for a sentence are identical. Rather than constructing each task from a random subset of systems, each task should be constructed from the set of distinct outputs for that sentence. The pairwise rankings could then be re-associated with the systems to generate a larger set of pairwise rankings.

This would be a bit more respectful of people's times (it's annoying to see identical outputs), and would also let us potentially gather data more quickly. On the WMT14 data, for example, there are identical system outputs on over half the sentences.

CC: @cfedermann

Exported XML is not well-formed in ranking task

tag 'appraise-results' is ended by tag 'appraise-result' (missing 's')

Add admin files to static folder for deployment

Copy admin files to static folder for deployment without DEBUG=True.

Can I reset my password with the email address used during singup?

It seems I forget my password...

General use of the tool

After downloading and running Appraise it seems that the current code is intended to work only for WMT15 evaluation. I would like to create my own evaluation tasks but the corresponding link does not exist. How can I use the current version of Appraise for preparing my own evaluations?

Add affiliation to user accounts

Extend user accounts (or associated profiles) with information about a user's affiliation.

OR in case that does not work, create respective groups inside Django's admin backend and thus allow to assign users to "affiliation groups".

The latter option would work out-of-the-box.

Randomise SECRET_KEY on initialisation

Add code to properly randomize SECREY_KEY inside settings.py as likely none of the users would otherwise do it.

Language status page should show system counts

In ascertaining whether enough HITs have been collected, it would be really helpful if the Language pair status page listed the number of systems, e.g.,

English → Czech (X systems) 519 remaining 1481 completed

Update HIT requirements for WMT15 groups

Add updated dictionary with current HIT completion requirements for WMT15 participants.

Validation error when pressing submit, on ranking task

There is a validation error every time I press submit on the Validation task. It is reproducible for every existing ranking task.

Minor appearance issue with ranking task

These are two minor (rather template) issues observed in Firefox 8. They do not affect functionality
Have a look here: http://www.dfki.de/~elav01/tmp/appraise-screenshot.png

Source and reference are too close to each other. It would be nice to increase the space between the two divs or add a border
Some times, when source is longer than the reference, the rank radio buttons of the first sentence appear right underneath the reference (right column) and not on the left

Add language pairs attribute to users

Add an additional "language pairs" attribute to user profiles.

This does require the addition of customised user profiles;
see the Django documentation over here:

https://docs.djangoproject.com/en/1.3/topics/auth/#storing-additional-information-about-users

Seems this has improved greatly in Django v1.5:

https://docs.djangoproject.com/en/1.5/releases/1.5/#configurable-user-model

Crashes with Django 1.5

The web application crashes with Django 1.5. This is the error shown in the browser:

NoReverseMatch at /appraise/

'url' requires a non-empty first argument. The syntax changed in Django 1.5, see the docs.

Request Method: GET
Request URL: http://127.0.0.1:8000/appraise/
Django Version: 1.5
Exception Type: NoReverseMatch
Exception Value:

'url' requires a non-empty first argument. The syntax changed in Django 1.5, see the docs.

Exception Location: /usr/local/lib/python2.7/dist-packages/django/template/defaulttags.py in render, line 402
Python Executable: /usr/bin/python
Python Version: 2.7.6
Python Path:

Server time: Wed, 27 Aug 2014 16:09:03 +0200

Add "reset" functionality for single words in error classification

Add a button that allows to reset error classes for a single word only.

Update ranking template

Matt pointed out these points:

move the radio buttons to the right of the text;
label which is best and worse; and
enforce that with a visual cue (gradient from green to red).

(A nice way would be a gradient from green to red aligned with the buttons)

Crashes with Django 1.6.1

Appraise crashes with Django 1.6.1 (the version of Django included in Ubuntu 14.04).

When running "python manage.py syncdb", I get the following error:

Traceback (most recent call last):
File "manage.py", line 7, in
from django.core.management import execute_manager
ImportError: cannot import name execute_manager

Time mean across language pairs is wrong

My average time of 4' and 4' between two language pairs is 8' instead of 4'

Easy scriptable export

It would be nice to have a URL I could query to automate a RankingResult CSV export. e.g.,

wget -O backup.csv http://appraise.cf/admin/wmt14/rankingresult/

I think security-through-obscurity would be fine here, but requiring a secret token or something via a GET string would be fine.

Improve usability of inter-annotator agreement computation in status view

Compute inter-annotator agreement (IAA) for the number of annotators or coders (C) which maximises the number of items (I) that have been evaluated by the respective sub set of coders. This avoids showing no IAA scores until all coders have completed the task.

Additionally, there could be checkboxes which can be used to toggle whether a system should be included in IAA computation or not. That way, different scenarios could be tested in a more playful way, from within the status view.

If implemented, the checkbox selection should also be reflected when downloading results data.

cfedermann / appraise Goto Github PK

appraise's People

Contributors

Stargazers

Watchers

Forkers

appraise's Issues

Recommend Projects

Recommend Topics

Recommend Org