Comments (9)
+1
I'm doing sequence to sequence labeling right now, but sometimes there are errors in grammar or some kind of typos from the data. It would be really great if we are able to fix errors directly from doccano, but instead right now I have to export data from doccano into my computer, edit, then import it back into doccano.
from doccano.
Yes. We already have an API for editing document so we can implement this feature by implementing the frontend.
But one problem is that if we edit a document with annotation in a sequence labeling project, the annotated data will become useless. An example is as follows:
Original Text: Plsident Obama
Annotation: {'start_offset': 9, 'end_offset': 14}
Edited Text: Plesident Obama
Annotation: {'start_offset': 9, 'end_offset': 14} # incorrect
True annotation: {'start_offset': 10, 'end_offset': 15}
One solution is that when we edit the document, we delete all their annotation. This is my thought.
from doccano.
yes, it is A very clear and straight demands
from doccano.
Lack of information.
from doccano.
Is this feature on planing?
from doccano.
well, other possible options might be:
- only delete labels that have
start_offset
after edited character position - compare what are actual values for
old_doc[start_offset:end_offset] == new_doc[start_offset:end_offset]
and if they are equal, keep them (covers 1) as well as cases when edits are swapping chars, without adding/removing them outside of labeled items
But this might still not work for everyone
from doccano.
I think it is not a big issue.
Alternative solution is, we save both original text and modified text
from doccano.
Implementing data editing feature on doccano is a little overwhelming. It would be good to identify the territory of the tool.
from doccano.
I support this suggestion, I am seeing incorrectly spelled text, wrong punctuations, etc, etc. Would love to have the capability to fix typos, grammatical errors, etc, as I see them.
from doccano.
Related Issues (20)
- Broken images in docs HOT 2
- Doccano not displayed properly in browser when not assigned base website URL
- Serve Doccano behind reverse proxy and sub-url HOT 2
- use filename as document id when importing dataset HOT 2
- Error during importing label in .json format, file size should be less than 100MB
- AttributeError: type object 'CustomRESTRequestModel' has no attribute 'model_json_schema' HOT 1
- Cannot import any dataset
- Some characters like '/' was considered as Escape Character in the feature "export Dataset". HOT 1
- Retrieve deleted data
- Doccano on remote server: import data on server HOT 1
- export data is null HOT 2
- 标注问题 HOT 3
- Bug in labeling of text sections requiring scrolling HOT 3
- The server of doccano freeze HOT 1
- Problem in the log in as a new user
- PRIVACY-PRESERVING MULTI-KEYWORD SEARCHABLE ENCRYPTION FOR DISTRIBUTED SYSTEMS
- ValueError: DataFrame columns must be unique for orient='records'
- Any ways to get the labels/tags in addition to {{ text }} through the API ? HOT 1
- Annotator users can't annotate data (django exception TemplateDoesNotExist)
- Using "doccano init" after "pip install doccano" gives error backend.cli module not found HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from doccano.