chrisjbryant / errant Goto Github PK
View Code? Open in Web Editor NEWERRor ANnotation Toolkit: Automatically extract and classify grammatical errors in parallel original and corrected sentences.
License: MIT License
ERRor ANnotation Toolkit: Automatically extract and classify grammatical errors in parallel original and corrected sentences.
License: MIT License
Hi,
The pip install doesn't work and also from file.
The problem is the outdated spacy that doesn't work, pip can't build wheels for it and it throws errors when trying to import errant). manually updating spacy seem to solve it. (I am yet to use errant deeply with this so I might find it did not)
python 3.7.3
spacy 2.2.4
gcc 8.3 if relevant
Datasets like FCE have been standardised into m2 formats using ERRANT. But some models take parallel corpuses as inputs - so how does one change it back to their parallel corpus format? (Reverse of what ERRANT does)
How can I eliminate this error report "AttributeError: 'English' object has no attribute 'tagger'"? I changed several data models, but they didn't work.
code:
`import errant
annotator = errant.load('en')
orig = annotator.parse('This are gramamtical sentence .')
cor = annotator.parse('This is a grammatical sentence .')
edits = annotator.annotate(orig, cor)
for e in edits:
print(e.o_start, e.o_end, e.o_str, e.c_start, e.c_end, e.c_str, e.type)`
Using the most up to date github clone we have found ERRANT output (e.g. on w&i as attached) sometimes categorizes an error as of unnecessary(U:) type although it is a replacement type.
See for example:
S The rich people will buy a car but the poor people always need to use a bus or taxi .
A 0 2|||U:DET|||Rich|||REQUIRED|||-NONE-|||0
What is the domain of Error.o_start
, etc? Are they token indices within spaCy docs?
In the documentation you have mentioned that spacy 2.0 is less compatible with ERRANT. What is the nature of this incompatibility and any pointers on what can be done to correct for it?
Note to self: update errant to work with spacy 3.
I am not able to generate m2 files for the case when annotations are missing for certain sentences for some of the annotators. Choosing orig==annotated has its side effects. Am I missing something?
Hi.
I am running into the following error:
For the source, target pairs:
source: In the article mrom The the New York Times.
target: In the article from The New York Times.
The edit mrom -> from is missed by ERRANT. The output from ERRANT was:
["Orig: [4, 6, 'The the'], Cor: [4, 5, 'The'], Type: 'U:DET'"]
On digging a little, it seems to be the issue with all alignment types of the following form
Input: w1 w2 w3
Output: w4 w5
such that w3.lower() == w5.lower()
Alignment Sequence: S w1 -> w4, D w2 -> "", S w3 -> w5
Then the edit "w1" -> "w4" is missed, and "w2 w3" -> "w5" is generated by errant.en.merger.process_seq
Example:
source: "In thir the"
target: "On The"
Errant Output: ["Orig: [1, 3, 'Thir the'], Cor: [1, 2, 'The'], Type: 'U:NOUN'"]
# Missing In -> On
Hi dear author, ERRANT is such an excellent tool and I'm very happy to see that the character level cost in the sentence alignment function is now computed by the much faster [python-Levenshtein] library instead of python's native difflib.SequenceMatcher
, which makes ERRANT 3x faster.
I want to know if there are other potential explorations that can increase the speed.
Can you give me some clues? Thank you very much!
Hi Chris, thanks for the big 2.0 updates!
This is regarding the following section of the README
Note: ERRANT does not support spaCy 2 at this time. spaCy 2 POS tags are slightly different from spaCy 1 POS tags and so ERRANT rules, which were designed for spaCy 1, may not always work with spaCy 2.
Since Python can't handle having multiple versions of a given library in a single project, and we need to use features that were introduced post spacy 2.0, we currently have to keep ERRANT isolated in a separate service which we talk to over HTTP. This is not ideal. Since ERRANT now supports passing in an nlp
spacy object, it seems like adding support for spacy >= 2.0 would not be bad.
Specifically, I think we could check nlp._meta['spacy_version']
. If the spacy version is less than 2.0, nlp._meta
doesn't exist, above 2.0, this gives us the exact spacy version. For this current purpose, just testing is_spacy_2_or_above = bool(getattr(nlp, "_meta", False))
should be enough. Then the quickest fix would be to just map the 2.0 tags to 1.9 tags if is_spacy_2_or_above
.
Is this acceptable? If not, is there some other path to supporting spacy 2.0+? Thank you!
EDIT: we are happy to work on this, we'd just like to find an approach that you would approve.
Are there any scripts for that?
In compare_m2.py
, the edits for a coder obtained from the extract_edits()
function are in the form of (start,end):category
.
While comparing the extracted edits for the hypothesis and gold corrections in compareEdits()
function here:
Line 100 in fb3196e
# On occasion, multiple tokens at same span.
for h_cat in ref_edits[h_edit]: # Use ref dict for TP
tp += 1
# Each dict value [TP, FP, FN]
if h_cat in cat_dict.keys():
cat_dict[h_cat][0] += 1
else:
cat_dict[h_cat] = [1, 0, 0]
(start,end)
and then they are checked to see whether their error categories match.(start,end)
and the error categories for a hypothesis edit and a reference edit are equal, then it is counted as a true positive.(6,7):R:NOUN:NUM
and the reference edit is (6,7):R:NOUN:NUM
. Here their (start,end) and error categories are same and hence, they are being counted as true positive.Hi Chris,
Thanks for your updating new packages!
This is regarding the install error, when I install your package both from pip and source, it would give me the following error messages:
Installing collected packages: numpy, murmurhash, cymem, preshed, wrapt, tqdm, toolz, cytoolz, plac, six, dill, termcolor, pathlib, thinc, pip, ujson, idna, urllib3, chardet, certifi, requests, regex, webencodings, html5lib, wcwidth, ftfy, spacy, nltk, python-Levenshtein, errant Running setup.py install for murmurhash ... error ERROR: Command errored out with exit status 1: command: /Users/helen/errant/errant/errant_env/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/1_/s5yrcv056kq02m5_6v6pv23c0000gn/T/pip-install-972fo3pe/murmurhash/setup.py'"'"'; __file__='"'"'/private/var/folders/1_/s5yrcv056kq02m5_6v6pv23c0000gn/T/pip-install-972fo3pe/murmurhash/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /private/var/folders/1_/s5yrcv056kq02m5_6v6pv23c0000gn/T/pip-record-ysbyt_yn/install-record.txt --single-version-externally-managed --compile --install-headers /Users/helen/errant/errant/errant_env/include/site/python3.6/murmurhash cwd: /private/var/folders/1_/s5yrcv056kq02m5_6v6pv23c0000gn/T/pip-install-972fo3pe/murmurhash/ Complete output (36 lines): running install running build running build_py creating build creating build/lib.macosx-10.7-x86_64-3.6 creating build/lib.macosx-10.7-x86_64-3.6/murmurhash copying murmurhash/__init__.py -> build/lib.macosx-10.7-x86_64-3.6/murmurhash copying murmurhash/about.py -> build/lib.macosx-10.7-x86_64-3.6/murmurhash creating build/lib.macosx-10.7-x86_64-3.6/murmurhash/tests copying murmurhash/tests/__init__.py -> build/lib.macosx-10.7-x86_64-3.6/murmurhash/tests copying murmurhash/tests/test_import.py -> build/lib.macosx-10.7-x86_64-3.6/murmurhash/tests copying murmurhash/mrmr.pyx -> build/lib.macosx-10.7-x86_64-3.6/murmurhash copying murmurhash/__init__.pxd -> build/lib.macosx-10.7-x86_64-3.6/murmurhash copying murmurhash/mrmr.pxd -> build/lib.macosx-10.7-x86_64-3.6/murmurhash creating build/lib.macosx-10.7-x86_64-3.6/murmurhash/include creating build/lib.macosx-10.7-x86_64-3.6/murmurhash/include/murmurhash copying murmurhash/include/murmurhash/MurmurHash2.h -> build/lib.macosx-10.7-x86_64-3.6/murmurhash/include/murmurhash copying murmurhash/include/murmurhash/MurmurHash3.h -> build/lib.macosx-10.7-x86_64-3.6/murmurhash/include/murmurhash running build_ext building 'murmurhash.mrmr' extension creating build/temp.macosx-10.7-x86_64-3.6 creating build/temp.macosx-10.7-x86_64-3.6/murmurhash gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/Users/helen/anaconda3/include -arch x86_64 -I/Users/helen/anaconda3/include -arch x86_64 -I/Users/helen/anaconda3/include/python3.6m -I/private/var/folders/1_/s5yrcv056kq02m5_6v6pv23c0000gn/T/pip-install-972fo3pe/murmurhash/murmurhash/include -I/Users/helen/errant/errant/errant_env/include -I/Users/helen/anaconda3/include/python3.6m -c murmurhash/mrmr.cpp -o build/temp.macosx-10.7-x86_64-3.6/murmurhash/mrmr.o -O3 -Wno-strict-prototypes -Wno-unused-function warning: include path for stdlibc++ headers not found; pass '-stdlib=libc++' on the command line to use the libc++ standard library instead [-Wstdlibcxx-not-found] 1 warning generated. gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/Users/helen/anaconda3/include -arch x86_64 -I/Users/helen/anaconda3/include -arch x86_64 -I/Users/helen/anaconda3/include/python3.6m -I/private/var/folders/1_/s5yrcv056kq02m5_6v6pv23c0000gn/T/pip-install-972fo3pe/murmurhash/murmurhash/include -I/Users/helen/errant/errant/errant_env/include -I/Users/helen/anaconda3/include/python3.6m -c murmurhash/MurmurHash2.cpp -o build/temp.macosx-10.7-x86_64-3.6/murmurhash/MurmurHash2.o -O3 -Wno-strict-prototypes -Wno-unused-function warning: include path for stdlibc++ headers not found; pass '-stdlib=libc++' on the command line to use the libc++ standard library instead [-Wstdlibcxx-not-found] 1 warning generated. gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/Users/helen/anaconda3/include -arch x86_64 -I/Users/helen/anaconda3/include -arch x86_64 -I/Users/helen/anaconda3/include/python3.6m -I/private/var/folders/1_/s5yrcv056kq02m5_6v6pv23c0000gn/T/pip-install-972fo3pe/murmurhash/murmurhash/include -I/Users/helen/errant/errant/errant_env/include -I/Users/helen/anaconda3/include/python3.6m -c murmurhash/MurmurHash3.cpp -o build/temp.macosx-10.7-x86_64-3.6/murmurhash/MurmurHash3.o -O3 -Wno-strict-prototypes -Wno-unused-function warning: include path for stdlibc++ headers not found; pass '-stdlib=libc++' on the command line to use the libc++ standard library instead [-Wstdlibcxx-not-found] 1 warning generated. g++ -bundle -undefined dynamic_lookup -L/Users/helen/anaconda3/lib -arch x86_64 -L/Users/helen/anaconda3/lib -arch x86_64 -arch x86_64 build/temp.macosx-10.7-x86_64-3.6/murmurhash/mrmr.o build/temp.macosx-10.7-x86_64-3.6/murmurhash/MurmurHash2.o build/temp.macosx-10.7-x86_64-3.6/murmurhash/MurmurHash3.o -o build/lib.macosx-10.7-x86_64-3.6/murmurhash/mrmr.cpython-36m-darwin.so clang: warning: libstdc++ is deprecated; move to libc++ with a minimum deployment target of OS X 10.9 [-Wdeprecated] ld: library not found for -lstdc++ clang: error: linker command failed with exit code 1 (use -v to see invocation) error: command 'g++' failed with exit status 1 ---------------------------------------- ERROR: Command errored out with exit status 1: /Users/helen/errant/errant/errant_env/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/1_/s5yrcv056kq02m5_6v6pv23c0000gn/T/pip-install-972fo3pe/murmurhash/setup.py'"'"'; __file__='"'"'/private/var/folders/1_/s5yrcv056kq02m5_6v6pv23c0000gn/T/pip-install-972fo3pe/murmurhash/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /private/var/folders/1_/s5yrcv056kq02m5_6v6pv23c0000gn/T/pip-record-ysbyt_yn/install-record.txt --single-version-externally-managed --compile --install-headers /Users/helen/errant/errant/errant_env/include/site/python3.6/murmurhash Check the logs for full command output.
I searched the error on Google and tried the possible solutions, such as update my Xcode, reinstall command tool lines, it doesn't work, therefore I wonder if you could give me some advice? Thanks in advance for your help!
Hi, I have a question about duplicate corrections.
errant_parallel
sometimes makes duplicate corrections, e.g.
echo "If you want to actally know somebody you can spend the whole day with that person or place but if you do not , you do not even speak to that person or even go there . " > orig.txt
echo "If you want to actually know somebody , you can spend the whole day with that person or place , but if you do not , you do not even speak to that person or even go there . " > sys.txt
echo "If you want to actually get to know someone , or something , you can spend the whole day with that person , or place , and if you do not , you would n't have reason to even speak to that person , or even go there . " > ref.txt
errant_parallel -orig orig.txt -cor sys.txt -out hyp.m2
errant_parallel -orig orig.txt -cor ref.txt -out ref.m2
errant_compare -hyp hyp.m2 -ref ref.m2
(The above is line 612 of JFLEG-dev. The reference is the first annotation.)
In the above case, errant_compare
shows
=========== Span-Based Correction ============
TP FP FN Prec Rec F0.5
4 0 9 1.0 0.3077 0.6897
==============================================
However, hyp.m2
has only three correction, so TP=4 is strange.
S If you want to actally know somebody you can spend the whole day with that person or place but if you do not , you do not even speak to that person or even go there .
A 4 5|||R:SPELL|||actually|||REQUIRED|||-NONE-|||0
A 7 7|||M:PUNCT|||,|||REQUIRED|||-NONE-|||0
A 18 18|||M:PUNCT|||,|||REQUIRED|||-NONE-|||0
The reason of this is the duplicate corrections in the reference.
Actually, ref.m2
has two lines of A 7 7|||M:PUNCT|||,|||REQUIRED|||-NONE-|||0
.
(I don't know why such duplication appears.)
S If you want to actally know somebody you can spend the whole day with that person or place but if you do not , you do not even speak to that person or even go there .
A 4 5|||R:SPELL|||actually|||REQUIRED|||-NONE-|||0
A 5 5|||M:VERB|||get to|||REQUIRED|||-NONE-|||0
A 6 7|||R:NOUN|||someone|||REQUIRED|||-NONE-|||0
A 7 7|||M:PUNCT|||,|||REQUIRED|||-NONE-|||0
A 7 7|||M:CONJ|||or|||REQUIRED|||-NONE-|||0
A 7 7|||M:NOUN|||something|||REQUIRED|||-NONE-|||0
A 7 7|||M:PUNCT|||,|||REQUIRED|||-NONE-|||0
A 16 16|||M:PUNCT|||,|||REQUIRED|||-NONE-|||0
A 18 18|||M:PUNCT|||,|||REQUIRED|||-NONE-|||0
A 18 19|||R:CONJ|||and|||REQUIRED|||-NONE-|||0
A 25 27|||R:OTHER|||would n't have|||REQUIRED|||-NONE-|||0
A 27 27|||M:OTHER|||reason to|||REQUIRED|||-NONE-|||0
A 32 32|||M:PUNCT|||,|||REQUIRED|||-NONE-|||0
During errant_compare
, the coder_dict[coder][(7, 7, ',')]
has multiple values: ['M:PUNCT', 'M:PUNCT']
.
This adds two points to the evaluation score because ref_edits[h_edit]
has two values (in here)
Is it expected?
Personally, I do not think it is desirable for the number of TP to exceed the number of edits of a hypothesis.
Possible solutions would be to
errant.Annotator.annotate()
from outputting duplicate corrections.coder_dict
variable in errant.commands.compare_m2.py
only has a single value (now it is a list).Thank you for your development of ERRANT!
(This is an aside, but I am developing an API-based errant_compare and noticed this problem because the my results did not match the official results.)
Hi!
I have found a bug which is pretty difficult to be replicated: in certain cases (especially if you re-install spacy after installing errant), errant will "apparently" work, giving feedback on the sentences it corrects...but in reality it won't, resulting in errant not recognising most of the mistakes.
Would it be possible to add a simple test, with few basic sentence pairs, e.g., "He go home. -> He goes home." on which errant is evaluated, so that after installation one can check if spacy is working?
I know that, especially with spacy 2.x, the results won't be always the same...but I still think that this kind of feedback could be useful to check that errant is working "reasonably" well together with spacy.
If that is okay, I can make a PR with a new folder and file tests/test_errant_base.py, with 10-20 simple sentence pairs, where I check how many of the mistakes are correctly recognised by errant.
Hello Chris, I'm trying to convert my parallel dataset into m2 format so I used:
import errant ! errant_parallel -orig D5-src.txt -cor D5-trg.txt -out /out_m2.m2
and the output I got is:
Loading resources... Processing parallel files...
am I doing something wrong?
Note: I am using Google Colab and my dataset is in Arabic language
can errant simulate errors of sentence instead of correcting it? I am trying to build a dataset with ground-truth original transcript/text and error text of it.
It would be great if the functionality in the errant_compare
command were available for invocation as an API call, so it could be used for things like early stopping when training GEC models.
I've looked through the compare_m2 file and it doesn't look like it would be all that much work to refactor things so that everything worked the same way but it was possible to import a function that returned a dict
with the computed scores instead of printing them, so if this is the kind of thing you'd be willing to accept a PR for, I'd be happy to give it a go myself sometime in the next couple weeks. If not, it would be super awesome if you were able to get to it at some point.
Hi @chrisjbryant , the API quickstart script below is not working.
import errant
annotator = errant.load('en')
orig = annotator.parse('This are gramamtical sentence .')
cor = annotator.parse('This is a grammatical sentence .')
edits = annotator.annotate(orig, cor)
for e in edits:
print(e.o_start, e.o_end, e.o_str, e.c_start, e.c_end, e.c_str, e.type)
Error:
OSError: [E050] Can't find model 'en'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.
After python3 -m spacy download en_core_web_sm
, it says
Successfully installed en_core_web_sm-2.3.1
✔ Download and installation successful
You can now load the model via spacy.load('en_core_web_sm')
You do not have sufficient privilege to perform this operation.
✘ Couldn't link model to 'en'
Creating a symlink in spacy/data failed. Make sure you have the required
permissions and try re-running the command as admin, or use a virtualenv. You
can still import the model as a module and call its load() method, or create the
symlink manually.
C:\Users\xxx\anaconda3\envs\chat-langchain\lib\site-packages\en_core_web_sm
--> C:\Users\xxx\anaconda3\envs\chat-langchain\lib\site-packages\spacy\data\en
⚠ Download successful but linking failed
Creating a shortcut link for 'en' didn't work (maybe you don't have admin
permissions?), but you can still load the model via its full package name: nlp =
spacy.load('en_core_web_sm')
I had to update the code to below before it works.
import errant
import spacy
import spacy.cli
# spacy.cli.download("en_core_web_md")
nlp = spacy.load('en_core_web_md')
annotator = errant.load('en', nlp)
# annotator = errant.load('en_core_web_md')
orig = annotator.parse('This are gramamtical sentence .')
cor = annotator.parse('This is a grammatical sentence .')
edits = annotator.annotate(orig, cor)
for e in edits:
print(e.o_start, e.o_end, e.o_str, e.c_start, e.c_end, e.c_str, e.type)
Dear Chris :)
I have applied Errant for fce.test.me and I got unexpected results as :
=========== Span-Based Correction ============
TP FP FN Prec Rec F0.5
2 15503 4547 0.0001 0.0004 0.0002
=======================================
I follow you implementation as same as in the documentation as follow:
I have applied errant_parallel
using, errant_parallel -orig m2Scripts/orig_sentes.txt -cor m2Scripts/corec_sentes.txt -out m2Scripts/output.m2
files as in GitHub. Actually, I'm confused about the format of the parallel corrected text file
.
For the errant_m2
I applied as errant_m2 -auto m2Scripts/output.m2 -out m2Scripts/auto_output
.
The last step errant_compare
as errant_compare -hyp m2Scripts/auto_output -ref m2Scripts/fce.test.m2
, results was are :
=========== Span-Based Correction ============
TP FP FN Prec Rec F0.5
2 15503 4547 0.0001 0.0004 0.0002
=======================================
Could you please help to fix this issue?
Kind regards
Aiman Solyman
Could you add /*.egg-info/
and __pycache__/
into .gitignore
? I'm using this repo as submodule in git and these temporary files are created after installation.
Some versions of default python are missing or use an older version of the wheel
package.
This raises an error when installing errant: error: invalid command 'bdist_wheel'
Although ERRANT was actually installed successfully and you can ignore this error, the fix is simply to install/upgrade the wheel package in your python venv before you install errant:
pip3 install -U wheel
I was adapting this code for our private use. I noticed that using
python-Levenshtein
package is 100x better than SequenceMatcher ratio.
Levenshtein.ratio(A, B) gets you the same result.
I understand that this library is more for offline benchmarking use, but it doesn't hurt to be faster 😉 .
btw Can you explain the rationale for the custom cost function for substitutions? Any example on how using it changes outcomes of the path taken.
There are some sentences where I noticed that the error type statement is not accurate enough.
I noticed that he is using a model of size sm, and I intended to replace it with a larger model, but it seems that the improvement is not significant. Is there any other way to improve his accuracy?
By the way,thank you for providing this tool, it is very useful!
Hi, thanks for the awesome library.
Is there any way to remove python-Levenshtein
from the dependencies? It's licensed under GPLv2 which is not compatible with MIT of errant.
Dear @ALL
I have an overview of your documentation, I'm still confused about how to evaluate my Neural Network model (GEC). As I understood that, I have to translate the test set (correcting), then build a new (M2) file using errant_parallel
command. The last step is to use errant_compare
with the span-based correction to get F0.5
score.
Is this correct?
What is the optimal way to evaluate my NN model using Errant?
Regards,
I used errant to preprocess the Oscar Tamil Dataset.
The source m2
file looks like this.
S முன்னாள் ஜனாதிபதி மஹிந்த ராஜபக்ஷவினால் முன்னெடுக்கப்பட்ட போராட்டம் உட்பட வேலைநிறுத்த போராட்டங்களுக்கான நிதி அனுசரணையை சீனாவே வழங்கி நாட்டையும் அரசாங்கத்தையும் நெருக்கடிக்குள்ளாக்க முயல்கிறது என சமூக நலன்புரி பிரதி அமைச்சர் ரஞ்சன் ராமநாயக்க தெரிவித்தார்.
A 0 1|||R:OTHER|||முன்னாழ்|||REQUIRED|||-NONE-|||0
A 2 3|||R:OTHER|||மஹிண்த|||REQUIRED|||-NONE-|||0
A 7 8|||R:OTHER|||வேளைணிறுத்த|||REQUIRED|||-NONE-|||0
A 8 9|||R:NOUN|||போராட்டங்களுக்காண|||REQUIRED|||-NONE-|||0
A 9 10|||R:OTHER|||ணிதி|||REQUIRED|||-NONE-|||0
A 10 11|||R:NOUN|||அநுசரநையை|||REQUIRED|||-NONE-|||0
A 15 16|||R:NOUN|||ணெருக்கடிக்குல்ளாக்க|||REQUIRED|||-NONE-|||0
A 19 20|||R:OTHER|||ணலந்புரி|||REQUIRED|||-NONE-|||0
A 23 24|||R:NOUN|||ராமனாயக்க|||REQUIRED|||-NONE-|||0
A 24 25|||R:OTHER|||தெரிவித்தார்|||REQUIRED|||-NONE-|||0
The corresponding generated section of corr_sentences.txt
looks like this.
S முன்னாள் ஜனாதிபதி மஹிந்த ராஜபக்ஷவினால் முன்னெடுக்கப்பட்ட போராட்டம் உட்பட வேலைநிறுத்த போராட்டங்களுக்கான நிதி அனுசரணையை சீனாவே வழங்கி நாட்டையும் அரசாங்கத்தையும் நெருக்கடிக்குள்ளாக்க முயல்கிறது என சமூக நலன்புரி பிரதி அமைச்சர் ரஞ்சன் ராமநாயக்க தெரிவித்தார்.
A 0 1|||R:OTHER|||முன்னாழ்|||REQUIRED|||-NONE-|||0
A 2 3|||R:OTHER|||மஹிண்த|||REQUIRED|||-NONE-|||0
A 7 8|||R:OTHER|||வேளைணிறுத்த|||REQUIRED|||-NONE-|||0
A 8 9|||R:NOUN|||போராட்டங்களுக்காண|||REQUIRED|||-NONE-|||0
A 9 10|||R:OTHER|||ணிதி|||REQUIRED|||-NONE-|||0
A 10 11|||R:NOUN|||அநுசரநையை|||REQUIRED|||-NONE-|||0
A 15 16|||R:NOUN|||ணெருக்கடிக்குல்ளாக்க|||REQUIRED|||-NONE-|||0
A 19 20|||R:OTHER|||ணலந்புரி|||REQUIRED|||-NONE-|||0
A 23 24|||R:NOUN|||ராமனாயக்க|||REQUIRED|||-NONE-|||0
A 24 25|||R:OTHER|||தெரிவித்தார்|||REQUIRED|||-NONE-|||0
The corresponding section of incorr_sentences.txt
looks like this.
S முன்னாள் ஜனாதிபதி மஹிந்த ராஜபக்ஷவினால் முன்னெடுக்கப்பட்ட போராட்டம் உட்பட வேலைநிறுத்த போராட்டங்களுக்கான நிதி அனுசரணையை சீனாவே வழங்கி நாட்டையும் அரசாங்கத்தையும் நெருக்கடிக்குள்ளாக்க முயல்கிறது என சமூக நலன்புரி பிரதி அமைச்சர் ரஞ்சன் ராமநாயக்க தெரிவித்தார்.
0 1|||R:OTHER|||முன்னாழ்|||REQUIRED|||-NONE-|||0
A work 3|||R:OTHER|||மஹிண்த|||REQUIRED|||-NONE-|||0
7 8|||R:OTHER|||வேளைணிறுத்த|||REQUIRED|||-NONE-|||0
Badly do my 9|||R:NOUN|||போராட்டங்களுக்காண|||REQUIRED|||-NONE-|||0
9 10|||R:OTHER|||ணிதி|||REQUIRED|||-NONE-|||0
A English 10 11|||R:NOUN|||அநுசரநையை|||REQUIRED|||-NONE-|||0
up relatively 15 16|||R:NOUN|||ணெருக்கடிக்குல்ளாக்க|||REQUIRED|||-NONE-|||0
19 20|||R:OTHER|||ணலந்புரி|||REQUIRED|||-NONE-|||0
Change 23 24|||R:NOUN|||ராமனாயக்க|||REQUIRED|||-NONE-|||0
24 25|||R:OTHER|||தெரிவித்தார்|||REQUIRED|||-NONE-|||0
The first correction line doesn't start with A
. The word work
appears randomly after A
in the second correction line. The corresponding sentence in the source file does not have the word work
. Other lines also have similar patterns.
Hello Chris,
I am working on Errant for Czech and found the following line problematic:
Line 64 in 6c0d521
The issue is that it is True even if start != 0 (if (len(c) == 1 and o[0].text[0].isupper())
gets evaluated to True.
In such case, return on lines 66-67 will omit "the preceding part of combo".
I suppose that the fix is to enforce start == 0 by adding a pair of brackets:
if start == 0 and ((len(o) == 1 and c[0].text[0].isupper()) or \
(len(c) == 1 and o[0].text[0].isupper())):
Traceback (most recent call last):
File "C:/Users/ITJaylon/Desktop/errant/errant/test.py", line 3, in
annotator = errant.load('en')
File "C:\Users\ITJaylon\Desktop\errant\errant_init_.py", line 16, in load
nlp = nlp or spacy.load(lang, disable=["ner"])
File "E:\Anaconda\envs\errant_env\lib\site-packages\spacy_init_.py", line 30, in load
return util.load_model(name, **overrides)
File "E:\Anaconda\envs\errant_env\lib\site-packages\spacy\util.py", line 172, in load_model
return load_model_from_path(Path(name), **overrides)
File "E:\Anaconda\envs\errant_env\lib\site-packages\spacy\util.py", line 198, in load_model_from_path
meta = get_model_meta(model_path)
File "E:\Anaconda\envs\errant_env\lib\site-packages\spacy\util.py", line 253, in get_model_meta
raise IOError(Errors.E053.format(path=meta_path))
OSError: [E053] Could not read meta.json from en\meta.json
Process finished with exit code 1
There is one line code In parser function
text = self.nlp.tokenizer.tokens_from_list(text.split())
why do not just use nlp.tokenizer(text)
directly ? This code can really accelerate tokenizing process.
Hi 😊
I've encountered a problem while using errant:
I think there are conflicts between the version of python and spacy and I couldn't fix it
Python 3.6.13 |Anaconda, Inc.| (default, Jun 4 2021, 14:25:59)
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import errant
>>> errant.load('en')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/root/anaconda3/envs/errant200/lib/python3.6/site-packages/errant/__init__.py", line 19, in load
classifier = import_module("errant.%s.classifier" % lang)
File "/root/anaconda3/envs/errant200/lib/python3.6/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 994, in _gcd_import
File "<frozen importlib._bootstrap>", line 971, in _find_and_load
File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 678, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/root/anaconda3/envs/errant200/lib/python3.6/site-packages/errant/en/classifier.py", line 40, in <module>
I use
Thank you
Hello :)
Is there any way to use errant with other languages like Arabic language, and using BERT Multi-language or BPEmb instead of spacy?
King regards,
Aiman Solyman
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.