Giter Club home page Giter Club logo

dkalpakchi / textinator Goto Github PK

View Code? Open in Web Editor NEW
7.0 2.0 1.0 15.73 MB

An internationalized highly customizable annotation and evaluation tool for Natural Language Processing (NLP) tasks

Home Page: https://textinator.readthedocs.io/en/latest/

License: GNU Affero General Public License v3.0

Python 49.74% JavaScript 30.72% HTML 17.14% CSS 2.04% Shell 0.22% Dockerfile 0.14%
annotation-tool data-annotation data-labeling data-labeling-tools dataset datasets machine-learning natural-language-generation natural-language-processing python

textinator's Introduction

Textinator logo

DOI Codacy Badge pre-commit

⚠️ The project undergoes major refactoring. The changes in the new version are likely to be breaking and the dumps from the old versions will not be compatible. ⚠️

New here?

Check out some introductory resources:

Try out Textinator on your own machine

First you will need to install Docker and docker-compose. Afterwards just follow these steps:

  1. Clone this repository by running git clone https://github.com/dkalpakchi/Textinator.git or download one of the releases and unpack it.
  2. Build and run container in either development or production mode, following the instructions in of the corresponding section below.

Deployment guide

The recommended way of deploying Textinator is through building a production version of the Docker container, as described in the Deployment guidelines. Note that the production version is most definitely more secure and reliable than the development version. However, it's not extremely scalable and hosts both database and Textinator instance on the same machine. The ultimate solution would be to use something like Kubernetes, but it is currently not supported out of the box.

Developer guide

A good starting place for familiarizing yourself with a codebase is via our API documentation. The documentation for developers is an ongoing effort, but some established workflows are described in our Development guidelines, (for instance, how to run a development Docker instance).

Contributing

Want to contribute to Textinator? Check out our Contribution guidelines.

Internationalization

The software is developed in English.

Partial translation is available for these languages (in alphabetical order):

Upcoming languages

  • Dutch
  • Spanish

Credits

Textinator depends on so many other wonderful open-source projects, that they deserve a special Credits file

Cite Textinator

@inproceedings{kalpakchi-boye-2022-textinator,
    title = "Textinator: an Internationalized Tool for Annotation and Human Evaluation in Natural Language Processing and Generation",
    author = "Kalpakchi, Dmytro  and
      Boye, Johan",
    booktitle = "Proceedings of the Thirteenth Language Resources and Evaluation Conference",
    month = jun,
    year = "2022",
    address = "Marseille, France",
    publisher = "European Language Resources Association",
    url = "https://aclanthology.org/2022.lrec-1.90",
    pages = "856--866"
}

textinator's People

Contributors

dependabot[bot] avatar dkalpakchi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

textinator's Issues

Unable to submit and clear the field(s)

Describe the bug
Unable to submit, according to what is on the screen; unable to clear marked fields after submission.
Such annotation is saved, though.

Expected behavior
Submission and switch to a new question/text.

Screenshots
Screenshot (103)
Screenshot (106)

Textinator version
The latest

Desktop (please complete the following information):

  • OS: Windows 10
  • Browser: Firefox
  • Version: 106.0.3

Marked spans in the text are not saved correctly for lists

Describe the bug
Marked spans in the text are not saved correctly for HTML lists (ul, ol), when the dataset is stored as markdown.

To Reproduce
Create a dataset with a markdown list and use it for a project that has at least one span marker. Annotate a span on one of the list items (<li>), it will show correct annotations visually, but will end up being incorrectly saved (with wrong offsets).

Expected behavior
The saved offsets should be correct

Textinator version
Latest

Desktop (please complete the following information):

  • OS: Ubuntu 22.10
  • Browser: Firefox

Blank spaces inside a marked field

Describe the bug
Blank/white spaces appear inside a marked field. When it is redone (first marking is not fully correct, so cleaned later and repeated with necessary corrections), an additional blank/white space appears.

Expected behavior
No blank/white spaces to appear inside the marked zone.

Screenshots
Screenshot (97)
Screenshot (98)

Textinator version
The latest

Desktop (please complete the following information):

  • OS: Windows 10
  • Browser: Firefox
  • Version: 106.0.3

Marking does not happen (2)

Describe the bug
A field to be marked consists of 12 words and is included into one of the fields previously marked (25 words). Both fields are marked as bases for alternatives, both are part of the same paragraph which is marked as a key paragraph. 12 words cannot be marked from the first attempt, 2-3 trials are needed.

Marked before: the skills of physical combat build up confidence to the point that one does not feel the constant need to defend one's honor through fighting
To be marked: does not feel the constant need to defend one's honor through fighting

Expected behavior
A field of 12 words marked without repeated actions.

Screenshots
If applicable, add screenshots to help explain your problem.

Textinator version
The latest

Desktop (please complete the following information):

  • OS: Windows 10
  • Browser: Firefox
  • Version: 106.0.3

Screenshot (91)

Wrong annotations on the border between the markers

Describe the bug
Sometimes the annotation is wrong, when a new marked span is on the border of the existing span.

To reproduce
To reproduce what is seen in the screenshot, annotate everything except the yellow marker. Then try to mark the yellow marker, starting from "I walked right..." and then a couple of problems will happen.

  1. The marker starts in the wrong place (from the beginning of the paragraph)
  2. The empty yellow marker appears before the beginning of the paragraph
  3. The empty purple markers appear afterwards (these appear after removing this yellow span and then re-marking a couple of times, it seems)

Expected behavior
Just the yellow span should be marked properly, starting from "I walked right..."

Screenshots
textinator_bug

Desktop (please complete the following information):

  • OS: Windows 10
  • Browser: Firefox
  • Version: 106.0.3

End of a marked field shifted to the next paragraph

Describe the bug
Green field ended at the start of the next paragraph.

Expected behavior
Green field to be ended within the same paragraph.

Screenshots
20221110 - 1 paragraph

Textinator version
The latest

Desktop (please complete the following information):

  • OS: Windows 10
  • Browser: Firefox
  • Version: 106.0.3

Inconsistent padding for nested labels

Describe the bug
After creating a nested label of more than 2 levels, the next nested label (that has the same parent label) messes up the padding of the parent

To Reproduce
For a text a b c d e first label like this [[a [b] c] d e], then attempt to add a new label [[a [b] c] d [e]]. The padding of the outermost label will then be incorrect.

Expected behavior
The padding should be correct

Screenshots
screenshot_20230127_131349

Textinator version
Latest

Desktop (please complete the following information):

  • OS: Ubuntu
  • Browser: Firefox
  • Version: 110

<br> tags sometimes disappear after annotating and removing the marker

Describe the bug
When annotating the whole paragraphs and then removing the marker for a couple of times, sometimes the last
tag disappears, merging both paragraphs, which can potentially result in the failed annotation.

To reproduce
Couldn't find exact steps to reproduce reliably.

Expected behavior

tags should remain in place and the paragraph division intact.

Desktop (please complete the following information):

  • OS: Ubuntu
  • Browser: Firefox
  • Version: 107

Marked fields are not complete / extra symbols included

Describe the bug
Fields marked as bases for an answer do not appear fully marked after submission, sometimes a few more symbols before or after the key field are included.

Expected behavior
Screenshot 1: "I try to support my son as much as I can." to be marked in purple.
Screenshot 2: "[...] Holz Wooden Airport." to be underlined as a part of the key paragraph.

Screenshots
image
image

Textinator version
The latest

Desktop (please complete the following information):

  • OS: Windows 10
  • Browser: Firefox
  • Version: 106.0.3

Additional context
Result in "The annotations are likely to be restored incorrectly.".

Wrong annotation when selecting line-by-line

Describe the bug
When annotating the whole paragraphs, selecting text line-by-line and not char-by-char fails to fire the marking.

To reproduce

  1. Open any text and select a span by lines (using the left side of the text area)
  2. Attempt to mark the text as anything using either a hotkey or the marker in the markers area
  3. The selection disappears, but the text is not marked with that marker

Expected behavior
The text should be marked with the selected marker

Desktop (please complete the following information):

  • OS: Ubuntu
  • Browser: Firefox
  • Version: 107
  • Textinator version: v1.1.0

Creating a nested span on 4th level of nesting fails

Describe the bug
Attempting to create the span that covers 2 already existing spans, fails. Interestingly, this fails only if the first span already has 3+ levels of nesting.

To Reproduce

  1. Get the spans and create a selection as in picture 1
  2. Attempt to create a new span over the 2 in picture 1 (see result in picture 2)

Expected behavior
The span should be created correctly

Screenshots
Picture 1:
screenshot_20230129_141019

Picture 2:
screenshot_20230129_141027

Textinator version
Latest

Desktop (please complete the following information):

  • OS: Ubuntu 22.10
  • Browser: Firefox
  • Version: 110

Editing

Describe the bug
Unable to delete a comment (Other, Additional commentaries), while allows to change it and have at least one symbol in the field.

Expected behavior
An option to completely clean the field would be useful if the issue is already solved.

Textinator version
The latest.

Desktop (please complete the following information):

  • OS: Windows 10
  • Browser: Firefox
  • Version: 106.0.3

Marking does not happen

Describe the bug
A field to be marked consists of 4 tokens (1 number, 3 words) and is included into one of the fields previously marked (5 tokens). Both fields are marked as bases for alternatives. Four tokens cannot be marked from the first attempt, 2-3 trials are needed.

Whole quote:
They've changed the cans and oxygen tanks--in one case, part of the remains of a helicopter--into 74 pieces of art that have already gone on exhibition in Nepal's capital.
Marked before: into 74 pieces of art
To be marked: 74 pieces of art

Expected behavior
A field of 4 tokens to be marked inside another one of 5 tokens with no repeated actions required.

Textinator version
The latest

Desktop (please complete the following information):

  • OS: Windows 10
  • Browser: Firefox
  • Version: 106.0.3

Screenshot (88)

Search short words

Describe the bug
Unable to search for short words like 'how', 'why', 'who', 'not', 'I', 'you', 'they' etc. - the process ends up with "Nothing to show".
Finds something for 'cat', 'dog', though :)

Textinator version
The latest

Desktop (please complete the following information):

  • OS: Windows 10
  • Browser: Firefox
  • Version: 106.0.3

Making a mutl-paragraph annotation, when paragraphs are comprised of <p> tags, doesn't work

Describe the bug
Making a mutl-paragraph annotation, when paragraphs are comprised of <p> tags, doesn't work

To Reproduce

  1. Attempt to mark as in the first screenshot
  2. Actually mark and get a result in the second screenshot.

Screenshots
screenshot1
screenshot2

Expected behavior
The marked text from Screenshot 1 should be simply marked with a green label

Textinator version
Latest

Desktop (please complete the following information):

  • OS: Ubuntu
  • Browser: Firefox
  • Version: 108.0

The information from the label plugins is not saved

Describe the bug
When added a text field plugin to a marker, the assigned text is not saved

To Reproduce
Steps to reproduce the behavior:

  1. Add a text field plugin, save the project
  2. Mark something in the text, right click to get the context menu item
  3. Add the text to the text field, submit the label
  4. Update the page with Ctrl + Shift + R, and attempt to edit that label

Expected behavior
The text should be saved

Textinator version
1.3.0

Desktop (please complete the following information):

  • OS: Ubuntu
  • Browser: Firefox

Lost/changed focus -> unable to mark a field in colour

Describe the bug
Unable to mark the second red field.

To Reproduce
Steps to reproduce the behavior:

  1. Mark a paragraph as a key one (underline).
  2. Mark a phrase (in the middle of the paragraph) as 'Basis for A'
  3. Mark another phrase (in the end of the same paragraph) as 'Basis for A'
  4. Stop your mouse cursor at the cross which stands for the end of the paragraph.
  5. See error

Expected behavior
Must be able to mark the phrase in the end of the paragraph.

Screenshots
Screenshot (100)

Textinator version
The latest

Desktop (please complete the following information):

  • OS: Windows 10
  • Browser: Firefox
  • Version: 106.0.3

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.