camel-lab / camel_tools Goto Github PK

A suite of Arabic natural language processing tools developed by the CAMeL Lab at New York University Abu Dhabi.

License: MIT License

Python 100.00%

morphological-analysis nlp nlp-library nlp-apis arabic dialect-identification sentiment-analysis named-entity-recognition stemming pos-tagging

camel_tools's Introduction

CAMeL Tools

Introduction

CAMeL Tools is suite of Arabic natural language processing tools developed by the CAMeL Lab at New York University Abu Dhabi.

Please use GitHub Issues to report a bug or if you need help using CAMeL Tools.

Installation

You will need Python 3.7 - 3.10 (64-bit) as well as the Rust compiler installed.

Linux/macOS

You will need to install some additional dependencies on Linux and macOS. Primarily CMake, and Boost.

On Ubuntu/Debian you can install these dependencies by running:

sudo apt-get install cmake libboost-all-dev

On macOS you can install them using Homewbrew by running:

brew install cmake boost

Install using pip

pip install camel-tools

# or run the following if you already have camel_tools installed
pip install camel-tools --upgrade

On Apple silicon Macs you may have to run the following instead:

CMAKE_OSX_ARCHITECTURES=arm64 pip install camel-tools

# or run the following if you already have camel_tools installed
CMAKE_OSX_ARCHITECTURES=arm64 pip install camel-tools --upgrade

Install from source

# Clone the repo
git clone https://github.com/CAMeL-Lab/camel_tools.git
cd camel_tools

# Install from source
pip install .

# or run the following if you already have camel_tools installed
pip install --upgrade .

Installing data

To install the datasets required by CAMeL Tools components run one of the following:

# To install all datasets
camel_data -i all

# or just the datasets for morphology and MLE disambiguation only
camel_data -i light

# or just the default datasets for each component
camel_data -i defaults

See Available Packages for a list of all available datasets.

By default, data is stored in ~/.camel_tools. Alternatively, if you would like to install the data in a different location, you need to set the CAMELTOOLS_DATA environment variable to the desired path.

Add the following to your .bashrc, .zshrc, .profile, etc:

export CAMELTOOLS_DATA=/path/to/camel_tools_data

Windows

Note: CAMeL Tools has been tested on Windows 10. The Dialect Identification component is not available on Windows at this time.

Install using pip

pip install camel-tools -f https://download.pytorch.org/whl/torch_stable.html

# or run the following if you already have camel_tools installed
pip install --upgrade -f https://download.pytorch.org/whl/torch_stable.html camel-tools

Install from source

# Clone the repo
git clone https://github.com/CAMeL-Lab/camel_tools.git
cd camel_tools

# Install from source
pip install -f https://download.pytorch.org/whl/torch_stable.html .
pip install --upgrade -f https://download.pytorch.org/whl/torch_stable.html .

Installing data

To install the data packages required by CAMeL Tools components, run one of the following commands:

# To install all datasets
camel_data -i all

# or just the datasets for morphology and MLE disambiguation only
camel_data -i light

# or just the default datasets for each component
camel_data -i defaults

See Available Packages for a list of all available datasets.

By default, data is stored in C:\Users\your_user_name\AppData\Roaming\camel_tools. Alternatively, if you would like to install the data in a different location, you need to set the CAMELTOOLS_DATA environment variable to the desired path. Below are the instructions to do so (on Windows 10):

Press the Windows button and type env.
Click on Edit the system environment variables (Control panel).
Click on the Environment Variables... button.
Click on the New... button under the User variables panel.
Type CAMELTOOLS_DATA in the Variable name input box and the desired data path in Variable value. Alternatively, you can browse for the data directory by clicking on the Browse Directory... button.
Click OK on all the opened windows.

Documentation

To get started, you can follow along the Guided Tour for a quick overview of the components provided by CAMeL Tools.

You can find the full online documentation here for both the command-line tools and the Python API.

Alternatively, you can build your own local copy of the documentation as follows:

# Install dependencies
pip install sphinx recommonmark sphinx-rtd-theme

# Go to docs subdirectory
cd docs

# Build HTML docs
make html

This should compile all the HTML documentation in to docs/build/html.

Citation

If you find CAMeL Tools useful in your research, please cite our paper:

@inproceedings{obeid-etal-2020-camel,
   title = "{CAM}e{L} Tools: An Open Source Python Toolkit for {A}rabic Natural Language Processing",
   author = "Obeid, Ossama  and
      Zalmout, Nasser  and
      Khalifa, Salam  and
      Taji, Dima  and
      Oudah, Mai  and
      Alhafni, Bashar  and
      Inoue, Go  and
      Eryani, Fadhl  and
      Erdmann, Alexander  and
      Habash, Nizar",
   booktitle = "Proceedings of the 12th Language Resources and Evaluation Conference",
   month = may,
   year = "2020",
   address = "Marseille, France",
   publisher = "European Language Resources Association",
   url = "https://www.aclweb.org/anthology/2020.lrec-1.868",
   pages = "7022--7032",
   abstract = "We present CAMeL Tools, a collection of open-source tools for Arabic natural language processing in Python. CAMeL Tools currently provides utilities for pre-processing, morphological modeling, Dialect Identification, Named Entity Recognition and Sentiment Analysis. In this paper, we describe the design of CAMeL Tools and the functionalities it provides.",
   language = "English",
   ISBN = "979-10-95546-34-4",
}

License

CAMeL Tools is available under the MIT license. See the LICENSE file for more info.

Contribute

If you would like to contribute to CAMeL Tools, please read the CONTRIBUTE.rst file.

Contributors

camel_tools's People

Contributors

Stargazers

Watchers

camel_tools's Issues

I can't transilitirate, I get an error

This is the command the and error message that I get:
camel_transliterate --scheme bw2ar ../large_file.txt
Error: An unkown error occured during transliteration.

File tested on is attached.
Spelling errors in error message (bolded)
large_file.txt

camel_transliterate removes new lines

Transliteration with arclean scheme removes line break characters.

[BUG] error while installing camel_tool on Colab python 3.7

Describe the bug
I tried to run the following pip command on Colab with python 3.7 version:
!pip install camel-tools
!camel_data full

Usually it works fine but today it gave me the following error message:
Error: An error occured while extracting downloaded data.

I could assume this error happened due to exceeded server requests.

Can yon kindly help me with this issue.

Thanks in advance.

[QUESTION] Custom pre-trained model

Hey, I would like to ask, if there's a way to use the NER module with custom model trained on different dataset.
right now CAMEL uses one pre-trained model (arabert) that's trained on ANERCorp , to do NER task.

question is, can we train this model (arabert) on different dataset and use it, or can we train a new model and use it ?

Thx

Encoding error while reading database

Hi,
I am experiencing an encoding error while reading calima_star database in a Windows environment, and that is due to the fact that database.py relies on default encoding:

with open(fpath, 'r') as dbfile:

Then, if I try to db = CalimaStarDB.builtin_db(), it will raise an UnicodeError for the database is in UTF-8 (I guess), and Python default encoding in Windows is that of the computer language (mine, CP1252). Is there any way I could specify the encoding while reading the builtin_db?

If I replace that line at database.py to the one below, the error will be fixed, but that is not straightforward since I intend to making camel_tools pip installable through requirements.

with open(fpath, 'r', encoding="utf-8") as dbfile:

Thanks in advance

[ENHANCEMENT] PoS feature Inhancemnt

PoS feature Enhancement, especially for foreign words tag.
When i using tagger function with non Arabic words it always tag it as 'noun'. Based on your guide it must tag as 'FORIEGN'

`from camel_tools.tokenizers.word import simple_word_tokenize
from camel_tools.disambig.mle import MLEDisambiguator
from camel_tools.tagger.default import DefaultTagger

mle = MLEDisambiguator.pretrained()
tagger = DefaultTagger(mle, 'pos')

sentence = simple_word_tokenize(' HTML استخدم')

pos_tags = tagger.tag(sentence)

print(pos_tags)`

[QUESTION] Error during installation: "Building wheel for editdistance (setup.py) ... error

Hi, I'm running Windows 10 and Python 3 (with Anaconda). During installation, the following error occurs:

Thanks in advance!

Error while reading dialectid data

The home directory /Users/owo/ is not correct

Traceback (most recent call last):
File "kenlm.pyx", line 114, in kenlm.Model.init (python/kenlm.cpp:2132)
RuntimeError: util/file.cc:76 in int util::OpenReadOrThrow(const char*) threw ErrnoException because `-1 == (ret = open(name, 00))'.
No such file or directory while opening /Users/owo/.camel_tools/data/dialectid/default/lm/char/BAS.arpa

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "testdilectID.py", line 3, in
did = DialectIdentifier.pretrained()
File "/home/amahdaouy/.conda/envs/arbnlp/lib/python3.6/site-packages/camel_tools/dialectid/init.py", line 447, in pretrained
return dill.load(model_fp)
File "/home/amahdaouy/.local/lib/python3.6/site-packages/dill/_dill.py", line 270, in load
return Unpickler(file, ignore=ignore, *kwds).load()
File "/home/amahdaouy/.local/lib/python3.6/site-packages/dill/_dill.py", line 473, in load
obj = StockUnpickler.load(self)
File "kenlm.pyx", line 117, in kenlm.Model.init (python/kenlm.cpp:2242)
OSError: Cannot read model 'b'/Users/owo/.camel_tools/data/dialectid/default/lm/char/BAS.arpa'' (util/file.cc:76 in int util::OpenReadOrThrow(const char) threw ErrnoException because `-1 == (ret = open(name, 00))'. No such file or directory while opening /Users/owo/.camel_tools/data/dialectid/default/lm/char/BAS.arpa)

Disambiguation performance time

Hi, CAMeL team,
I wonder if you have already tested the disambiguation module performance in terms of time spent.
Using an Intel(R) Core(TM) i5-2400 CPU @ 3.10GHz, it takes me around 35 minutes to tagging 1,108,058 words.
Is there something I can do to shorten this time or is the disambiguation really a costly process? The pipeline I've been using is the following (older Camel version):

# 6.7 seconds initialization
db = CalimaStarDB.builtin_db()
analyzer = CalimaStarAnalyzer(db)
disambiguator = MLEDisambiguator(analyzer)

# 3.8 seconds for sentence segmentation using NLTK
sentences = nltk.sent_tokenize(text)

for sentence in sentences:
    words = simple_word_tokenize(sentence) # 1 second for all sentences summed up
    disamb_words = disambiguator.disambiguate(words) # 2033 seconds (34 minutes) for all sentences summed up

Best

Question

First i love your works guys its more than amazing i just have a question can i do custom text to speach since now one supports arabic

can't download the data required

mac-os catalina 10.15.7
I run the command for downloading required data in my terminal but it didn't return anything for long time until an error turned out as below:
(base) MacBook-Pro-3:camel_tools username$ camel_data full
Error: An error occured while downloading data.

I tried it with a vpn but it still doesn't work

Optimise NERecognizer's prediction speed

What feature would you like to improve?
NERecognizer()

Is your enhancement request related to a problem? Please describe.
I tested the speed of NERecognizer().pretrained() at making predictions. It takes 374.11 seconds to predict 549 sentences (on average, each sentence has 10.36 words and 60.87 characters). This is too slow to use for live analytical tools which process social media posts.

Describe the solution you'd like
I was wondering if there was a way to speed up this method?
Or if you could help me understand why it takes this long?

[BUG] Incorrect Lemmatization of "يارب"

Describe the bug
The substring "يارب" is currently being tokenised as one token (which is expected behaviour, since it doesn't include a whitespace). The token is then disambiguated as "يأرب" using all schemes (except 'bwtok' which outputs "ي+_أرب"). The lemma in the most likely analysis (using the MLEDisambiguator) is أَرِب, however. Shouldn't that just be "رب"?

To Reproduce
Steps to reproduce the behavior.
Provide any Python/Shell scripts as code blocks.

sentence = ' يارب سوره بالتوفيق يارب اوقاف القران'

simple_word_tokenize(sentence)
mle = MLEDisambiguator.pretrained()
disambig = mle.disambiguate(sentence)

lemmas = [d.analyses[0].analysis['lex'] for d in disambig]
print(lemmas)

Expected behavior
Expect "يارب" to be lemmatized as "رب"

Screenshots
If applicable, add screenshots to help explain your problem.
Preferably, attach error logs in code blocks.

Desktop (please complete the following information):

OS: MacOS 10.15.4
Python version 3.7
CAMeL Tools version: pip v. 1.1.0

Additional context
This is my first time executing Arabic NLP, so if I'm overlooking something or should be posting this elsewhere, please do let me know!

FileNotFoundError: [Errno 2] No such file or directory: '/root/.camel_tools/data/morphology_db/calima-msa-r13/morphology.db'

I am working on a project, and I want to use the Camel toolkit to perform NLP tasks. I have already installed the camel tools; I am working on google Collab, but I faced this problem.
FileNotFoundError: [Errno 2] No such file or directory: '/root/.camel_tools/data/morphology_db/calima-msa-r13/morphology.db'
diffrents solusions have been tried as you mentioned on other issues like pip install --upgrade
camel_data full or
camel_data light but still not working
Awaiting for your kind reply.
thank you in advance

Some operations/functions usage need more clarification please

Greetings,

Appreciate the efforts of doing such a grand project and making it accessible to other researchers,

I'm currently trying out the tools to use it in cleaning and extracting features off my data and I'm having trouble using some of the functionalities because their documentation isn't published yet or isn't clear enough.

"camel_arclean"
I couldn't find it's class or the way to invoke it

Utility
arclean Cleans Arabic text by

Deleting characters that are not in Arabic, ASCII, or Latin-1.
Converting all spacing characters to an ASCII space character.
Converting Indic digits into Arabic digits.
Converting extended Arabic letters into basic Arabic letters.
Converting 1-char presentation froms into simple basic forms.

"dialectid "
I've tried to run the example provided but I'm getting this error

from camel_tools.dialectid import DialectIdentifier

did = DialectIdentifier.pretrained()

sentences = [
    'مال الهوى و مالي شكون اللي جابني ليك  ما كنت انايا ف حالي بلاو قلبي يانا بيك',
    'بدي دوب قلي قلي بجنون بحبك انا مجنون ما بنسى حبك يوم'
]

predictions = did.predict(sentences)
top_dialects = [p.top for p in predictions]

File "Anaconda3\lib\site-packages\camel_tools\dialectid\__init__.py", line 34, in <module>
    import kenlm

ModuleNotFoundError: No module named 'kenlm'

"CalimaStarAnalyzer"
I'm getting POS=noun_prop for all words, and never getting a stem.
I'm depending on the first returned list of the list of lists that is returned by the functions, even though I checked the rest and didn't find any right analysis.
I used it on my data and used it on the example provided but couldn't figure out what's wrong.
for example the verb 'مشيت' when analyzed gives a number of possible tags but none of them is 'verb'

text = 'مشيت في الشارع' #example provided in doc
text2 = 'مقتل ضابط وجندي إسرائيليين في عملية دهس بالضفة الغربية'
from camel_tools.calima_star.database import CalimaStarDB
from camel_tools.calima_star.analyzer import CalimaStarAnalyzer

db = CalimaStarDB('E:\\Anaconda3\\Lib\\site-packages\\camel_tools\\calima_star\\databases\\calima-msa-1.0.db', 'a')
# Create analyzer with no backoff
analyzer = CalimaStarAnalyzer(db)
# Create analyzer with NOAN_ALL backoff
#analyzer = CalimaStarAnalyzer(db, 'NOAN_ALL')
# or
analyzer = CalimaStarAnalyzer(db, backoff='NOAN_ALL')

# To analyze a word, we can use the analyze() method
analyses1 = analyzer.analyze_words(text.split())
analyses = analyzer.analyze('مقتل') # All results=مقتل/NOUN_PROP

A snippet of returned analysis

{'diac': 'مقتل',
 'lex': 'مقتل_0',
 'bw': 'مقتل/NOUN_PROP',
 'gloss': 'NO_ANALYSIS',
 'pos': 'noun_prop',
 'prc3': '0',
 'prc2': '0',
 'prc1': '0',
 'prc0': '0',
 'per': 'na',
 'asp': 'na',
 'vox': 'na',
 'mod': 'na',
 'gen': 'm',
 'num': 's',
 'stt': 'd',
 'cas': 'u',
 'enc0': '0',
 'rat': 'i',
 'source': 'backoff',
 'form_gen': 'm',
 'form_num': 's',
 'catib6': '+NOM+',
 'ud': '+PROPN+',
 'pos_freq': -1.047404,
 'pos_lex_freq': -99.0,
 'lex_freq': -99.0,
 'root': '',
 'pattern': '',
 'caphi': 'm_q_t_l',
 'atbtok': 'مقتل',
 'd2tok': 'مقتل',
 'd1tok': 'مقتل',
 'atbseg': 'مقتل',
 'd3tok': 'مقتل',
 'd3seg': 'مقتل',
 'd2seg': 'مقتل',
 'd1seg': 'مقتل',
 'stem': 'مقتل',
 'stemgloss': 'NO_ANALYSIS',
 'stemcat': 'N0'}

"Generate lemma and features (CalimaStarReinflector)"
I couldn't find the file of the lemma db, and it wasn't clear the way of constructing the features dictionary.

"CalimaStarGenerator"
Same issue as above.

" Morphological Analyzer "
I'm not getting any analysis results, and the morphological tokenizer 'tokenize' is giving the same results as the 'simple_word_tokenize' in tokenizers

from camel_tools.tokenizers import morphological
from camel_tools.disambig.mle import MLEDisambiguator
from camel_tools.calima_star.analyzer import CalimaStarAnalyzer
from camel_tools.calima_star.database import CalimaStarDB

# Initialize database in reinflection mode
db_disa = CalimaStarDB('E:\\Anaconda3\\Lib\\site-packages\\camel_tools\\calima_star\\databases\\morphology_db\\almor-msa-ext\\morphology.db','r')
disa = MLEDisambiguator(CalimaStarAnalyzer(db_disa, backoff='NONE', norm_map='<camel_tools.utils.charmap.CharMapper object>', strict_digit=False, cache_size=0), mle_path=None)

disa_sentence = disa.disambiguate(text_token)#,top=1)

disa_word = disa.disambiguate_word(text_token, word_ndx =0) #,top=1)

res_morph = morphological.MorphologicalTokenizer(disa, scheme='atbtok', split=True, diac=False) #res_morph.scheme_set() #{'atbtok', 'd3tok'}

tokenized_morph = res_morph.tokenize(text_token)  #

text_token = ['مقتل',
 'ضابط',
 'وجندي',
 'إسرائيليين',
 'في',
 'عملية',
 'دهس',
 'بالضفة',
 'الغربية']

DisambiguatedWord(word='مقتل', analyses=[]),
 DisambiguatedWord(word='ضابط', analyses=[]),
 DisambiguatedWord(word='و', analyses=[]),
 DisambiguatedWord(word='جندي', analyses=[]),
 DisambiguatedWord(word='إسرائيليين', analyses=[]),
 DisambiguatedWord(word='في', analyses=[]),
 DisambiguatedWord(word='عملية', analyses=[]),
 DisambiguatedWord(word='دهس', analyses=[]),
 DisambiguatedWord(word='بالضفة', analyses=[]),
 DisambiguatedWord(word='الغربية', analyses=[])]

" MLEDisambiguator "

from camel_tools.disambig.mle import MLEDisambiguator

mle = MLEDisambiguator.pretrained()

sentence = 'الطفلان أكلا الطعام معاً وأخذا 5 تفاحات'.split()
disambig = mle.disambiguate(sentence)

# Let's, for example, use the top disambiguations to generate a diacritized
# version of the above sentence.
# Note that, in practice, you'll need to make sure that each word has a
# non-zero list of analyses.
diacritized = [d.analyses[0].analysis['diac'] for d in disambig]
print(' '.join(diacritized))

I'm getting results on some nouns, but so far I had no luck with POS or other features such as form_num, gen, mod when it comes to plurals, ones that are connected to a pronoun or verbs.. etc

#print
الطفلان اكلا الطَعامِ مَعاً واخذا 5 تفاحات

#Analysis
[DisambiguatedWord(word='الطفلان', analyses=[ScoredAnalysis(score=1.0, analysis={'diac': 'الطفلان', 'lex': 'الطفلان_0', 'bw': 'الطفلان/NOUN_PROP', 'gloss': 'NO_ANALYSIS', 'pos': 'noun_prop', 'prc3': '0', 'prc2': '0', 'prc1': '0', 'prc0': '0', 'per': 'na', 'asp': 'na', 'vox': 'na', 'mod': 'na', 'stt': 'i', 'cas': 'u', 'enc0': '0', 'rat': 'i', 'source': 'backoff', 'form_gen': '-', 'form_num': '-', 'gen': '-', 'ud': '+PROPN+', 'catib6': '+NOM+', 'pos_lex_freq': -99.0, 'num': '-', 'pos_freq': -99.0, 'lex_freq': -99.0, 'caphi': '2_l_t._f_l_aa_n', 'atbseg': 'NOAN', 'd3seg': 'NOAN', 'd2tok': 'NOAN', 'root': 'O', 'pattern': 'N1AN', 'd2seg': 'NOAN', 'atbtok': 'NOAN', 'd1tok': 'NOAN', 'd3tok': 'NOAN', 'd1seg': 'NOAN', 'stem': 'الطفلان', 'stemgloss': 'NO_ANALYSIS', 'stemcat': 'N0'})]),
 DisambiguatedWord(word='أكلا', analyses=[ScoredAnalysis(score=1.0, analysis={'diac': 'اكلا', 'lex': 'اكلا_0', 'bw': 'اكلا/NOUN_PROP', 'gloss': 'NO_ANALYSIS', 'pos': 'noun_prop', 'prc3': '0', 'prc2': '0', 'prc1': '0', 'prc0': '0', 'per': 'na', 'asp': 'na', 'vox': 'na', 'mod': 'na', 'stt': 'i', 'cas': 'u', 'enc0': '0', 'rat': 'i', 'source': 'backoff', 'form_gen': '-', 'form_num': '-', 'gen': '-', 'ud': '+PROPN+', 'catib6': '+NOM+', 'pos_lex_freq': -99.0, 'num': '-', 'pos_freq': -99.0, 'lex_freq': -99.0, 'caphi': '2_k_l_aa', 'atbseg': 'NOAN', 'd3seg': 'NOAN', 'd2tok': 'NOAN', 'root': 'O', 'pattern': 'N1AN', 'd2seg': 'NOAN', 'atbtok': 'NOAN', 'd1tok': 'NOAN', 'd3tok': 'NOAN', 'd1seg': 'NOAN', 'stem': 'اكلا', 'stemgloss': 'NO_ANALYSIS', 'stemcat': 'N0'})]),
 DisambiguatedWord(word='الطعام', analyses=[ScoredAnalysis(score=1.0, analysis={'diac': 'الطَعامِ', 'lex': 'طَعام_1', 'bw': 'ال/DET+طَعام/NOUN+ِ/CASE_DEF_GEN', 'gloss': 'the+food+[def.gen.]', 'pos': 'noun', 'prc3': '0', 'prc2': '0', 'prc1': '0', 'prc0': 'Al_det', 'per': 'na', 'asp': 'na', 'vox': 'na', 'mod': 'na', 'form_gen': 'm', 'gen': 'm', 'form_num': 's', 'num': 's', 'stt': 'd', 'cas': 'g', 'enc0': '0', 'rat': 'i', 'source': 'lex', 'stem': 'طَعام', 'stemcat': 'N', 'stemgloss': 'food', 'caphi': '2_a_t._t._a_3_aa_m_i', 'catib6': 'PRT+NOM+', 'ud': 'DET+NOUN+', 'root': 'ط.ع.م', 'pattern': 'ال1َ2ا3ِ', 'd3seg': 'ال+_طَعامِ', 'atbseg': 'الطَعامِ', 'd2seg': 'الطَعامِ', 'd1seg': 'الطَعامِ', 'd1tok': 'الطَّعامِ', 'd2tok': 'الطَّعامِ', 'atbtok': 'الطَّعامِ', 'd3tok': 'ال+_طَعامِ', 'pos_freq': '-0.4344233', 'lex_freq': '-4.660188', 'pos_lex_freq': '-4.660188'})]),
 DisambiguatedWord(word='معاً', analyses=[ScoredAnalysis(score=1.0, analysis={'diac': 'مَعاً', 'lex': 'مَعاً_1', 'bw': 'مَعاً/ADV', 'gloss': 'together', 'pos': 'adv', 'prc3': '0', 'prc2': '0', 'prc1': '0', 'prc0': '0', 'per': 'na', 'asp': 'na', 'vox': 'na', 'mod': 'na', 'form_gen': '-', 'gen': '-', 'form_num': '-', 'num': '-', 'stt': 'i', 'cas': 'u', 'enc0': '0', 'rat': 'y', 'source': 'lex', 'stem': 'مَعاً', 'stemcat': 'FW-Wa', 'stemgloss': 'together', 'caphi': 'm_a_3_a_n', 'catib6': '++', 'ud': '++', 'root': 'مع', 'pattern': '1َ2اً', 'd3seg': 'مَعاً', 'atbseg': 'مَعاً', 'd2seg': 'مَعاً', 'd1seg': 'مَعاً', 'd1tok': 'مَعاً', 'd2tok': 'مَعاً', 'atbtok': 'مَعاً', 'd3tok': 'مَعاً', 'pos_freq': '-99.0', 'lex_freq': '-99.0', 'pos_lex_freq': '-99.0'})]),
 DisambiguatedWord(word='وأخذا', analyses=[ScoredAnalysis(score=1.0, analysis={'diac': 'واخذا', 'lex': 'واخذا_0', 'bw': 'واخذا/NOUN_PROP', 'gloss': 'NO_ANALYSIS', 'pos': 'noun_prop', 'prc3': '0', 'prc2': '0', 'prc1': '0', 'prc0': '0', 'per': 'na', 'asp': 'na', 'vox': 'na', 'mod': 'na', 'stt': 'i', 'cas': 'u', 'enc0': '0', 'rat': 'i', 'source': 'backoff', 'form_gen': '-', 'form_num': '-', 'gen': '-', 'ud': '+PROPN+', 'catib6': '+NOM+', 'pos_lex_freq': -99.0, 'num': '-', 'pos_freq': -99.0, 'lex_freq': -99.0, 'caphi': 'w_aa_kh_dh_aa', 'atbseg': 'NOAN', 'd3seg': 'NOAN', 'd2tok': 'NOAN', 'root': 'O', 'pattern': 'N1AN', 'd2seg': 'NOAN', 'atbtok': 'NOAN', 'd1tok': 'NOAN', 'd3tok': 'NOAN', 'd1seg': 'NOAN', 'stem': 'واخذا', 'stemgloss': 'NO_ANALYSIS', 'stemcat': 'N0'})]),
 DisambiguatedWord(word='5', analyses=[ScoredAnalysis(score=1.0, analysis={'pos': 'digit', 'diac': '5', 'lex': '5_0', 'bw': '5/NOUN_NUM', 'gloss': '5', 'prc3': 'na', 'prc2': 'na', 'prc1': 'na', 'prc0': 'na', 'per': 'na', 'asp': 'na', 'vox': 'na', 'mod': 'na', 'gen': 'na', 'num': 'na', 'stt': 'na', 'cas': 'na', 'enc0': 'na', 'rat': 'na', 'source': 'digit', 'form_gen': 'na', 'form_num': 'na', 'catib6': 'NOM', 'ud': 'NUM', 'd3seg': '5', 'atbseg': '5', 'd2seg': '5', 'd1seg': '5', 'd1tok': '5', 'd2tok': '5', 'atbtok': '5', 'd3tok': '5', 'pos_freq': -99.0, 'pos_lex_freq': -99.0, 'lex_freq': -99.0, 'root': 'DIGIT', 'pattern': 'DIGIT', 'caphi': 'DIGIT', 'stem': '5', 'stemgloss': '5', 'stemcat': None})]),
 DisambiguatedWord(word='تفاحات', analyses=[ScoredAnalysis(score=1.0, analysis={'diac': 'تفاحات', 'lex': 'تفاحات_0', 'bw': 'تفاحات/NOUN_PROP', 'gloss': 'NO_ANALYSIS', 'pos': 'noun_prop', 'prc3': '0', 'prc2': '0', 'prc1': '0', 'prc0': '0', 'per': 'na', 'asp': 'na', 'vox': 'na', 'mod': 'na', 'stt': 'i', 'cas': 'u', 'enc0': '0', 'rat': 'i', 'source': 'backoff', 'form_gen': '-', 'form_num': '-', 'gen': '-', 'ud': '+PROPN+', 'catib6': '+NOM+', 'pos_lex_freq': -99.0, 'num': '-', 'pos_freq': -99.0, 'lex_freq': -99.0, 'caphi': 't_f_aa_7_aa_t', 'atbseg': 'NOAN', 'd3seg': 'NOAN', 'd2tok': 'NOAN', 'root': 'O', 'pattern': 'N1AN', 'd2seg': 'NOAN', 'atbtok': 'NOAN', 'd1tok': 'NOAN', 'd3tok': 'NOAN', 'd1seg': 'NOAN', 'stem': 'تفاحات', 'stemgloss': 'NO_ANALYSIS', 'stemcat': 'N0'})])]

How lemmatization works

Hi, people from CAMeL,
I don't understand how lemmatization works when I use calima_star module from Camel Tools (which I installed through pip).
The "lex" feature has numbers before the word itself (1_, 2_ etc.), and some have numbers after the word (_0), or even -1. What are those marks supposed to be, is there any reading you would recommend me?
Plus, is it safe to just strip those marks?
Thanks in advance

[BUG] Sentence length in generation of NER labels

Describe the bug
When the whole text of a book is given as input to predict_sentence to generate the NER labels, it throughs "Input sentence is too long" exception.

To Reproduce
ner = NERecognizer.pretrained()
ner.predict_sentence(text)

Transilteration of Arabic text

I could not find help on the Camel Tools help on how to use the transliteration utility. If you could provide me with Python sample code I will appreciate it.
Thank you
Here is some code I have tried:

from camel_tools.utils import *
s = "وَاخْرٍ الكلمات العربية في حال تركيبها: من الإعراب"
#print(s)
text = Transliterator.transliterate(s,strip_markers=False, ignore_markers=False)
print (text)

how to fine tune sentiment analysis pretrained model on custome dataset?

[BUG] Error when extracting downloaded data using camel_data full

I am working using Colab pro. I tried to install camel tools using the following command:
!pip install camel-tools
!camel_data full

Sometimes the installation was completed successfully and sometimes I encountered the following error:
Error: An error occurred while extracting downloaded data.

Do you have any clue why such an error is generated?

Thanks in advance.

[BUG] error installing camel-tools on python 3.7 and 3.8

Describe the bug
I use camel-tools (v1.1) in my romanize arabic ala-lc project, which was done using python 3.7. While trying to install my app in a fresh python 3.7 environment, I discovered camel-tools installation gives an error. I tried 3.8 environment and similar error occurs. Python 3.9 seems to install with no issues. The error does not seem to be specific to version 1.1.

To Reproduce
pip install camel-tools==1.1
(but also pip install camel-tools without version)

Screenshots
`Collecting camel-tools==1.1
Using cached camel_tools-1.1.0-py3-none-any.whl
Collecting dill
Using cached dill-0.3.4-py2.py3-none-any.whl (86 kB)
Collecting camel-kenlm
Using cached camel-kenlm-2021.12.27.tar.gz (418 kB)
Collecting transformers==3.0.2
Using cached transformers-3.0.2-py3-none-any.whl (769 kB)
Collecting scikit-learn
Using cached scikit_learn-1.0.2-cp37-cp37m-macosx_10_13_x86_64.whl (7.8 MB)
Collecting six
Using cached six-1.16.0-py2.py3-none-any.whl (11 kB)
Collecting torch>=1.3
Using cached torch-1.10.1-cp37-none-macosx_10_9_x86_64.whl (147.1 MB)
Collecting numpy
Using cached numpy-1.21.5-cp37-cp37m-macosx_10_9_x86_64.whl (16.9 MB)
Collecting docopt
Using cached docopt-0.6.2-py2.py3-none-any.whl
Collecting scipy
Using cached scipy-1.7.3-cp37-cp37m-macosx_10_9_x86_64.whl (33.0 MB)
Collecting editdistance
Using cached editdistance-0.6.0-cp37-cp37m-macosx_10_9_x86_64.whl (21 kB)
Collecting requests
Using cached requests-2.27.0-py2.py3-none-any.whl (63 kB)
Collecting cachetools
Using cached cachetools-5.0.0-py3-none-any.whl (9.1 kB)
Collecting future
Using cached future-0.18.2-py3-none-any.whl
Collecting pandas
Using cached pandas-1.3.5-cp37-cp37m-macosx_10_9_x86_64.whl (11.0 MB)
Collecting tokenizers==0.8.1.rc1
Using cached tokenizers-0.8.1rc1-cp37-cp37m-macosx_10_10_x86_64.whl (2.1 MB)
Collecting packaging
Using cached packaging-21.3-py3-none-any.whl (40 kB)
Collecting sentencepiece!=0.1.92
Using cached sentencepiece-0.1.96-cp37-cp37m-macosx_10_6_x86_64.whl (1.1 MB)
Collecting filelock
Using cached filelock-3.4.2-py3-none-any.whl (9.9 kB)
Collecting sacremoses
Using cached sacremoses-0.0.46-py3-none-any.whl (895 kB)
Collecting regex!=2019.12.17
Using cached regex-2021.11.10-cp37-cp37m-macosx_10_9_x86_64.whl (288 kB)
Collecting tqdm>=4.27
Using cached tqdm-4.62.3-py2.py3-none-any.whl (76 kB)
Collecting typing-extensions
Using cached typing_extensions-4.0.1-py3-none-any.whl (22 kB)
Collecting pyparsing!=3.0.5,>=2.0.2
Using cached pyparsing-3.0.6-py3-none-any.whl (97 kB)
Collecting pytz>=2017.3
Using cached pytz-2021.3-py2.py3-none-any.whl (503 kB)
Collecting python-dateutil>=2.7.3
Using cached python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/Caskroom/miniconda/base/envs/test-camel/lib/python3.7/site-packages (from requests->camel-tools==1.1) (2021.10.8)
Collecting idna<4,>=2.5
Using cached idna-3.3-py3-none-any.whl (61 kB)
Collecting charset-normalizer~=2.0.0
Using cached charset_normalizer-2.0.9-py3-none-any.whl (39 kB)
Collecting urllib3<1.27,>=1.21.1
Using cached urllib3-1.26.7-py2.py3-none-any.whl (138 kB)
Collecting joblib
Using cached joblib-1.1.0-py2.py3-none-any.whl (306 kB)
Collecting click
Using cached click-8.0.3-py3-none-any.whl (97 kB)
Collecting importlib-metadata
Using cached importlib_metadata-4.10.0-py3-none-any.whl (17 kB)
Collecting zipp>=0.5
Using cached zipp-3.7.0-py3-none-any.whl (5.3 kB)
Collecting threadpoolctl>=2.0.0
Using cached threadpoolctl-3.0.0-py3-none-any.whl (14 kB)
Building wheels for collected packages: camel-kenlm
Building wheel for camel-kenlm (setup.py) ... error
ERROR: Command errored out with exit status 1:
command: /usr/local/Caskroom/miniconda/base/envs/test-camel/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/33/0wcrq6q95r1c11srbk4x2yzm0000gn/T/pip-install-len8q7si/camel-kenlm_b11ec25ad31545fcb1d7392ddbd4efdf/setup.py'"'"'; file='"'"'/private/var/folders/33/0wcrq6q95r1c11srbk4x2yzm0000gn/T/pip-install-len8q7si/camel-kenlm_b11ec25ad31545fcb1d7392ddbd4efdf/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(file) if os.path.exists(file) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' bdist_wheel -d /private/var/folders/33/0wcrq6q95r1c11srbk4x2yzm0000gn/T/pip-wheel-yk0q8tf2
cwd: /private/var/folders/33/0wcrq6q95r1c11srbk4x2yzm0000gn/T/pip-install-len8q7si/camel-kenlm_b11ec25ad31545fcb1d7392ddbd4efdf/
Complete output (13 lines):
running bdist_wheel
running build
running build_ext
building 'kenlm' extension
creating build
creating build/temp.macosx-10.9-x86_64-3.7
creating build/temp.macosx-10.9-x86_64-3.7/util
creating build/temp.macosx-10.9-x86_64-3.7/lm
creating build/temp.macosx-10.9-x86_64-3.7/util/double-conversion
creating build/temp.macosx-10.9-x86_64-3.7/python
gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/usr/local/Caskroom/miniconda/base/envs/test-camel/include -arch x86_64 -I/usr/local/Caskroom/miniconda/base/envs/test-camel/include -arch x86_64 -I. -I/usr/local/Caskroom/miniconda/base/envs/test-camel/include/python3.7m -c util/pool.cc -o build/temp.macosx-10.9-x86_64-3.7/util/pool.o -O3 -DNDEBUG -DKENLM_MAX_ORDER=6 -std=c++11 -stdlib=libc++ -mmacosx-version-min=10.7 -DHAVE_ZLIB -DHAVE_BZLIB
gcc: error: unrecognized command-line option '-stdlib=libc++'
error: command 'gcc' failed with exit status 1

ERROR: Failed building wheel for camel-kenlm
Running setup.py clean for camel-kenlm
Failed to build camel-kenlm
Installing collected packages: zipp, typing-extensions, importlib-metadata, urllib3, tqdm, six, regex, pyparsing, numpy, joblib, idna, click, charset-normalizer, tokenizers, threadpoolctl, sentencepiece, scipy, sacremoses, requests, pytz, python-dateutil, packaging, filelock, transformers, torch, scikit-learn, pandas, future, editdistance, docopt, dill, camel-kenlm, cachetools, camel-tools
Running setup.py install for camel-kenlm ... error
ERROR: Command errored out with exit status 1:
command: /usr/local/Caskroom/miniconda/base/envs/test-camel/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/33/0wcrq6q95r1c11srbk4x2yzm0000gn/T/pip-install-len8q7si/camel-kenlm_b11ec25ad31545fcb1d7392ddbd4efdf/setup.py'"'"'; file='"'"'/private/var/folders/33/0wcrq6q95r1c11srbk4x2yzm0000gn/T/pip-install-len8q7si/camel-kenlm_b11ec25ad31545fcb1d7392ddbd4efdf/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(file) if os.path.exists(file) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /private/var/folders/33/0wcrq6q95r1c11srbk4x2yzm0000gn/T/pip-record-me6kza5t/install-record.txt --single-version-externally-managed --compile --install-headers /usr/local/Caskroom/miniconda/base/envs/test-camel/include/python3.7m/camel-kenlm
cwd: /private/var/folders/33/0wcrq6q95r1c11srbk4x2yzm0000gn/T/pip-install-len8q7si/camel-kenlm_b11ec25ad31545fcb1d7392ddbd4efdf/
Complete output (13 lines):
running install
running build
running build_ext
building 'kenlm' extension
creating build
creating build/temp.macosx-10.9-x86_64-3.7
creating build/temp.macosx-10.9-x86_64-3.7/util
creating build/temp.macosx-10.9-x86_64-3.7/lm
creating build/temp.macosx-10.9-x86_64-3.7/util/double-conversion
creating build/temp.macosx-10.9-x86_64-3.7/python
gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/usr/local/Caskroom/miniconda/base/envs/test-camel/include -arch x86_64 -I/usr/local/Caskroom/miniconda/base/envs/test-camel/include -arch x86_64 -I. -I/usr/local/Caskroom/miniconda/base/envs/test-camel/include/python3.7m -c util/pool.cc -o build/temp.macosx-10.9-x86_64-3.7/util/pool.o -O3 -DNDEBUG -DKENLM_MAX_ORDER=6 -std=c++11 -stdlib=libc++ -mmacosx-version-min=10.7 -DHAVE_ZLIB -DHAVE_BZLIB
gcc: error: unrecognized command-line option '-stdlib=libc++'
error: command 'gcc' failed with exit status 1
----------------------------------------
ERROR: Command errored out with exit status 1: /usr/local/Caskroom/miniconda/base/envs/test-camel/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/33/0wcrq6q95r1c11srbk4x2yzm0000gn/T/pip-install-len8q7si/camel-kenlm_b11ec25ad31545fcb1d7392ddbd4efdf/setup.py'"'"'; file='"'"'/private/var/folders/33/0wcrq6q95r1c11srbk4x2yzm0000gn/T/pip-install-len8q7si/camel-kenlm_b11ec25ad31545fcb1d7392ddbd4efdf/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(file) if os.path.exists(file) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /private/var/folders/33/0wcrq6q95r1c11srbk4x2yzm0000gn/T/pip-record-me6kza5t/install-record.txt --single-version-externally-managed --compile --install-headers /usr/local/Caskroom/miniconda/base/envs/test-camel/include/python3.7m/camel-kenlm Check the logs for full command output.`

Desktop (please complete the following information):

OS: macOS 11.5.2
Python 3.7
CAMel tools 1.1 installed on pip

Additional context
The error was discovered while trying to redeploy my romanization app on gcloud, which kept failing because deployment time surprassed the 10 minute limit.

[BUG] PicklingError when pickling MLEDisambiguator

Describe the bug
PicklingError when pickling MLEDisambiguator

To Reproduce

from camel_tools.disambig.mle import MLEDisambiguator
import pickle

d = MLEDisambiguator.pretrained()
pickle.dump(d, 'file.pkl')

Expected behavior
A saved pickled file that can later be loaded.

[ENHANCEMENT] Running MLEDisambiguator on AWS Cluster

What feature would you like to improve?
MLEDisambiguator

Is your enhancement request related to a problem? Please describe.
It's currently not possible to run the Disambiguator on a distributed system because it uses a local morphology.db file stored in the root directory. This file is not distributed to the workers on my AWS cluster.

Describe the solution you'd like
I'd like to be able to run the Disambiguator on my AWS cluster to speed up processing.

[QUESTION] Why is the rust compiler needed?

Hello,
In the Readme.rst, under installation, it is specified that a rust compiler needs to be installed. May I know the reason?
Thanks!

[ENHANCEMENT] morphological analyzer

What feature would you like to improve?
analyzer.analyze()

Is your enhancement request related to a problem? Please describe.
I'm not sure if this is intentional, but I couldn't find an explanation in the documentation.
If I insert a text in the analyzer that is an Arabic non-words, it return an empty list.
E.g. analyzer.analyze('لالالالا') -> []
However, if the text is in any other alphabet (e.g. analyzer.analyze('дадада')) , it returns an analysis stating 'source' as 'foreign'.

Describe the solution you'd like
I would rather it being consistent. If the word does not exist in the morph DB, it should either always return an empty list, or always return an analysis. If the word has Arabic chars, the analysis can include the info that the word does not exist in your DB.

Thanks!

[Documentation] Update Morphology Features page

Describe the bug
https://camel-tools.readthedocs.io/en/latest/reference/camel_morphology_features.html appears to be outdated.

To Reproduce
Analyze "وأما". Result for rat is a u. However that's not present in the docs, this isn't the only one I fell on. Especially for rot field: it also includes r and i as well as y and n which seemed awkward.

Expected behavior
"u" present in docs.

Desktop (please complete the following information):

OS Linux
Python version 3.7.4 Anaconda
CAMeL Tools version: Installed through pip. 1.2.0
Documentation version: latest

Verb Tenses

I would like to ask if there are any morphological feature describe the verb tenses?
e.g: if the verb is: future, past, present.

Thank u :)

[QUESTION] There seems to be a problem with the .pretrained() function

I keep getting the following error:

Traceback (most recent call last):
File "/Users/Jumana/Desktop/s.py", line 2, in
d=DialectIdentifier.pretrained()
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/camel_tools/dialectid/init.py", line 616, in pretrained
'No pretrained model for current Python version found.')
camel_tools.dialectid.PretrainedModelError: No pretrained model for current Python version found.

I followed what's in these links:

https://github.com/CAMeL-Lab/camel_tools#install-using-pip
https://github.com/CAMeL-Lab/camel_tools#installing-data

My Code:

from camel_tools.dialectid import DialectIdentifier
d=DialectIdentifier.pretrained()

OS: macOS
Python: version 3.7.6
CAMeL Tools version: 1.0.1
CAMeL Tools installation source: pip

Thanks.

[BUG] Estraction error when updating CAMeL data

After updating camel_tools to version 1.2.0 (Windows 10, Python 3.6), the following error is produced after_running camel_data full:

Error: An error occured while extracting downloaded data.

However, camel_data light runs with no errors.

Adding more details about the tools in the README

Dear CAMel tools contributors,

I am enthusiast about the tools but I can't find any information about the features that are currently available in the README.md file.
Can you please update it so that it's easier to navigate the repo?

Thanks,
Amr

Question about sentiment analyzer output

Hello,
I am trying to use the Camel sentiment analyzer on a set of tweets that I have compiled. They are preprocessed (i.e, "cleaned") and are in a csv file. I am using pandas and am creating a new column with the assigned sentiment. THe problem is that every time I run the script, I only get labels for some of the initial rows, but not the rest. For example, when the file had 6000+ clean tweets, and after like ten hours (and several trials), the most I got as SA labels was for the first 153 tweets . I tried a csv file with much less data , e.g. 500 tweets, and only got the first 52 labelled (after about 37 minutes). I am not sure what the issue is. Please advise, and thanks in advance.

[ENHANCEMENT] Title of enhancement request...

What feature would you like to improve?
Provide the name of the class, function, module, or command-line tool in question.

Is your enhancement request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Installation fails on Mac

running pip-install camel-tools fails on Mac in a freshly created Anaconda environment, and in other environments. I tested this with Python 3.7 and 3.8. Attached is the log file of my last install attempt

install.log

morphology.db No such file or directory

from camel_tools.disambig.mle import MLEDisambiguator
mle = MLEDisambiguator.pretrained()

it gives me error
FileNotFoundError: [Errno 2] No such file or directory: 'C:\Users\MCST\AppData\Roaming\camel_tools\data\morphology_db\calima-msa-r13\morphology.db'

[BUG] Error: An error occured while extracting downloaded data. Windows 10

Hello Camel Team, hope you're having a delightful day.

I installed the latest Camel tools (Version 1.2.0) on Windows 10 machine, using the command pip install camel-tools -f https://download.pytorch.org/whl/torch_stable.html.

Then I'm trying to install camel's full data by using camel_data full or python -m camel_tools.cli.camel_data full command, and suddenly getting an error message says: Error: An error occured while extracting downloaded data.

When trying to install the light version of data using camel_data light, it works without facing any problems.

Could you please let me know where I am missing a step?

Expected behavior
It was expected to start installing a full 1.8 GB of Camel's full data.

Screenshots

(base) PS C:\Users\me> camel_data full
Error: An error occured while extracting downloaded data.

Desktop (please complete the following information):

OS [Windows 10, Version: 21H1, 64bit]
Python version: 3.9 Anaconda3, 3.10 installed locally.
CAMeL Tools version as well as installation source (1.2.0, source).

Additional context

[BUG] pip install fails on python 3.10 with misleading error messages

Describe the bug
Under python 3.10.2:

pip install camel-tools via pypi fails, reporting "ERROR: Cannot install camel-tools==1.0.1, camel-tools==1.1.0 and camel-tools==1.2.0 because these package versions have conflicting dependencies."
pip install git+https://github.com/CAMeL-Lab/camel_tools.git fails reporting "ERROR: No matching distribution found for torch>=1.3"

But under python 3.9.10, camel-tools installs clean

To Reproduce

Build a clean virtual environment using pyenv and direnv

$ echo 'layout pyenv 3.10.2' > .envrc
direnv: error /Users/foo/Documents/camel-test/.envrc is blocked. Run `direnv allow` to approve its content
$ direnv allow
direnv: loading ~/Documents/camel-test/.envrc
direnv: export +PYENV_VERSION +VIRTUAL_ENV ~PATH
$ which python
/Users/foo/Documents/camel-test/.direnv/python-3.10.2/bin/python
$ pip install -U pip
Requirement already satisfied: pip in ./.direnv/python-3.10.2/lib/python3.10/site-packages (21.2.4)
Collecting pip
  Downloading pip-22.0.3-py3-none-any.whl (2.1 MB)
     |████████████████████████████████| 2.1 MB 3.6 MB/s 
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 21.2.4
    Uninstalling pip-21.2.4:
      Successfully uninstalled pip-21.2.4
Successfully installed pip-22.0.3
$ pip cache purge
Files removed: 36

try to install camel-tools from pypi with pip

$ pip install camel-tools
pip install camel-tools
Collecting camel-tools
  Downloading camel_tools-1.2.0.tar.gz (58 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 58.1/58.1 KB 1.1 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... done
Collecting future
  Downloading future-0.18.2.tar.gz (829 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 829.2/829.2 KB 8.4 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... done
Collecting six
  Downloading six-1.16.0-py2.py3-none-any.whl (11 kB)
Collecting docopt
  Downloading docopt-0.6.2.tar.gz (25 kB)
  Preparing metadata (setup.py) ... done
Collecting cachetools
  Downloading cachetools-5.0.0-py3-none-any.whl (9.1 kB)
Collecting numpy
  Downloading numpy-1.22.2-cp310-cp310-macosx_11_0_arm64.whl (12.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 12.8/12.8 MB 15.0 MB/s eta 0:00:00
Collecting scipy
  Downloading scipy-1.8.0-cp310-cp310-macosx_12_0_arm64.whl (28.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 28.7/28.7 MB 15.0 MB/s eta 0:00:00
Collecting pandas
  Downloading pandas-1.4.1-cp310-cp310-macosx_11_0_arm64.whl (10.5 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 10.5/10.5 MB 15.8 MB/s eta 0:00:00
Collecting scikit-learn
  Downloading scikit_learn-1.0.2-cp310-cp310-macosx_12_0_arm64.whl (6.9 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.9/6.9 MB 15.3 MB/s eta 0:00:00
Collecting dill
  Downloading dill-0.3.4-py2.py3-none-any.whl (86 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 86.9/86.9 KB 3.4 MB/s eta 0:00:00
Collecting camel-tools
  Downloading camel_tools-1.1.0.tar.gz (56 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56.2/56.2 KB 2.1 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... done
  Downloading camel_tools-1.0.1.tar.gz (54 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 54.9/54.9 KB 1.6 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... done
ERROR: Cannot install camel-tools==1.0.1, camel-tools==1.1.0 and camel-tools==1.2.0 because these package versions have conflicting dependencies.

The conflict is caused by:
    camel-tools 1.2.0 depends on torch>=1.3
    camel-tools 1.1.0 depends on torch>=1.3
    camel-tools 1.0.1 depends on torch>=1.3

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts

Try instead to install with pip from github:

$ pip cache purge
Files removed: 23
pip install git+https://github.com/CAMeL-Lab/camel_tools.git
Collecting git+https://github.com/CAMeL-Lab/camel_tools.git
  Cloning https://github.com/CAMeL-Lab/camel_tools.git to /private/var/folders/1f/w7zrbf5n4p5_vbys0mlx23k40000gn/T/pip-req-build-um_xvl55
  Running command git clone --filter=blob:none --quiet https://github.com/CAMeL-Lab/camel_tools.git /private/var/folders/1f/w7zrbf5n4p5_vbys0mlx23k40000gn/T/pip-req-build-um_xvl55
  Resolved https://github.com/CAMeL-Lab/camel_tools.git to commit c26947b53ea93526099a1291ebebc52902fb7ff9
  Preparing metadata (setup.py) ... done
Collecting future
  Downloading future-0.18.2.tar.gz (829 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 829.2/829.2 KB 7.5 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... done
Collecting six
  Downloading six-1.16.0-py2.py3-none-any.whl (11 kB)
Collecting docopt
  Downloading docopt-0.6.2.tar.gz (25 kB)
  Preparing metadata (setup.py) ... done
Collecting cachetools
  Downloading cachetools-5.0.0-py3-none-any.whl (9.1 kB)
Collecting numpy
  Downloading numpy-1.22.2-cp310-cp310-macosx_11_0_arm64.whl (12.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 12.8/12.8 MB 16.1 MB/s eta 0:00:00
Collecting scipy
  Downloading scipy-1.8.0-cp310-cp310-macosx_12_0_arm64.whl (28.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 28.7/28.7 MB 15.1 MB/s eta 0:00:00
Collecting pandas
  Downloading pandas-1.4.1-cp310-cp310-macosx_11_0_arm64.whl (10.5 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 10.5/10.5 MB 15.5 MB/s eta 0:00:00
Collecting scikit-learn
  Downloading scikit_learn-1.0.2-cp310-cp310-macosx_12_0_arm64.whl (6.9 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.9/6.9 MB 15.4 MB/s eta 0:00:00
Collecting dill
  Downloading dill-0.3.4-py2.py3-none-any.whl (86 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 86.9/86.9 KB 3.1 MB/s eta 0:00:00
ERROR: Could not find a version that satisfies the requirement torch>=1.3 (from camel-tools) (from versions: none)
ERROR: No matching distribution found for torch>=1.3

Try using python 3.9 instead of 3.10. Remove and replace the virtual environment:

$ rm -rf .direnv
$ rm -rf .envrc
direnv: unloading
$ pyenv install 3.9.10
python-build: use [email protected] from homebrew
python-build: use readline from homebrew
Downloading Python-3.9.10.tar.xz...
-> https://www.python.org/ftp/python/3.9.10/Python-3.9.10.tar.xz
Installing Python-3.9.10...
python-build: use readline from homebrew
python-build: use zlib from xcode sdk
Installed Python-3.9.10 to /Users/foo/.pyenv/versions/3.9.10
$ echo 'layout pyenv 3.9.10' > .envrc
direnv: error /Users/foo/Documents/camel-test/.envrc is blocked. Run `direnv allow` to approve its content
$ direnv allow
direnv: loading ~/Documents/camel-test/.envrc
direnv: export +PYENV_VERSION +VIRTUAL_ENV ~PATH
$ which python
/Users/foo/Documents/camel-test/.direnv/python-3.9.10/bin/python
$ pip cache purge
Files removed: 19

Try again using pip and pypi

$ pip cache purge
Files removed: 270
$ pip install camel-tools
Collecting camel-tools
  Downloading camel_tools-1.2.0.tar.gz (58 kB)
     |████████████████████████████████| 58 kB 2.8 MB/s 
Collecting future
  Downloading future-0.18.2.tar.gz (829 kB)
     |████████████████████████████████| 829 kB 4.9 MB/s 
Collecting six
  Downloading six-1.16.0-py2.py3-none-any.whl (11 kB)
Collecting docopt
  Downloading docopt-0.6.2.tar.gz (25 kB)
Collecting cachetools
  Downloading cachetools-5.0.0-py3-none-any.whl (9.1 kB)
Collecting numpy
  Downloading numpy-1.22.2-cp39-cp39-macosx_11_0_arm64.whl (12.8 MB)
     |████████████████████████████████| 12.8 MB 40.1 MB/s 
Collecting scipy
  Downloading scipy-1.8.0-cp39-cp39-macosx_12_0_arm64.whl (28.7 MB)
     |████████████████████████████████| 28.7 MB 17.7 MB/s 
Collecting pandas
  Downloading pandas-1.4.1-cp39-cp39-macosx_11_0_arm64.whl (10.5 MB)
     |████████████████████████████████| 10.5 MB 13.7 MB/s 
Collecting scikit-learn
  Downloading scikit_learn-1.0.2-cp39-cp39-macosx_12_0_arm64.whl (6.9 MB)
     |████████████████████████████████| 6.9 MB 15.1 MB/s 
Collecting dill
  Downloading dill-0.3.4-py2.py3-none-any.whl (86 kB)
     |████████████████████████████████| 86 kB 15.1 MB/s 
Collecting torch>=1.3
  Downloading torch-1.10.2-cp39-none-macosx_11_0_arm64.whl (44.6 MB)
     |████████████████████████████████| 44.6 MB 13.9 MB/s 
Collecting transformers>=3.0.2
  Downloading transformers-4.17.0-py3-none-any.whl (3.8 MB)
     |████████████████████████████████| 3.8 MB 13.3 MB/s 
Collecting editdistance
  Downloading editdistance-0.6.0-cp39-cp39-macosx_11_0_arm64.whl (19 kB)
Collecting requests
  Downloading requests-2.27.1-py2.py3-none-any.whl (63 kB)
     |████████████████████████████████| 63 kB 5.4 MB/s 
Collecting camel-kenlm
  Downloading camel-kenlm-2021.12.27.tar.gz (418 kB)
     |████████████████████████████████| 418 kB 12.7 MB/s 
Collecting typing-extensions
  Downloading typing_extensions-4.1.1-py3-none-any.whl (26 kB)
Collecting sacremoses
  Downloading sacremoses-0.0.47-py2.py3-none-any.whl (895 kB)
     |████████████████████████████████| 895 kB 12.6 MB/s 
Collecting pyyaml>=5.1
  Downloading PyYAML-6.0-cp39-cp39-macosx_11_0_arm64.whl (173 kB)
     |████████████████████████████████| 173 kB 15.6 MB/s 
Collecting regex!=2019.12.17
  Downloading regex-2022.3.2-cp39-cp39-macosx_11_0_arm64.whl (281 kB)
     |████████████████████████████████| 281 kB 18.8 MB/s 
Collecting tokenizers!=0.11.3,>=0.11.1
  Downloading tokenizers-0.11.6-cp39-cp39-macosx_12_0_arm64.whl (3.4 MB)
     |████████████████████████████████| 3.4 MB 14.8 MB/s 
Collecting tqdm>=4.27
  Downloading tqdm-4.63.0-py2.py3-none-any.whl (76 kB)
     |████████████████████████████████| 76 kB 14.5 MB/s 
Collecting huggingface-hub<1.0,>=0.1.0
  Downloading huggingface_hub-0.4.0-py3-none-any.whl (67 kB)
     |████████████████████████████████| 67 kB 12.5 MB/s 
Collecting packaging>=20.0
  Downloading packaging-21.3-py3-none-any.whl (40 kB)
     |████████████████████████████████| 40 kB 16.1 MB/s 
Collecting filelock
  Downloading filelock-3.6.0-py3-none-any.whl (10.0 kB)
Collecting pyparsing!=3.0.5,>=2.0.2
  Downloading pyparsing-3.0.7-py3-none-any.whl (98 kB)
     |████████████████████████████████| 98 kB 29.4 MB/s 
Collecting pytz>=2020.1
  Downloading pytz-2021.3-py2.py3-none-any.whl (503 kB)
     |████████████████████████████████| 503 kB 52.2 MB/s 
Collecting python-dateutil>=2.8.1
  Downloading python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)
     |████████████████████████████████| 247 kB 18.0 MB/s 
Collecting certifi>=2017.4.17
  Downloading certifi-2021.10.8-py2.py3-none-any.whl (149 kB)
     |████████████████████████████████| 149 kB 17.1 MB/s 
Collecting charset-normalizer~=2.0.0
  Downloading charset_normalizer-2.0.12-py3-none-any.whl (39 kB)
Collecting idna<4,>=2.5
  Downloading idna-3.3-py3-none-any.whl (61 kB)
     |████████████████████████████████| 61 kB 15.2 MB/s 
Collecting urllib3<1.27,>=1.21.1
  Downloading urllib3-1.26.8-py2.py3-none-any.whl (138 kB)
     |████████████████████████████████| 138 kB 17.0 MB/s 
Collecting joblib
  Downloading joblib-1.1.0-py2.py3-none-any.whl (306 kB)
     |████████████████████████████████| 306 kB 12.0 MB/s 
Collecting click
  Downloading click-8.0.4-py3-none-any.whl (97 kB)
     |████████████████████████████████| 97 kB 30.7 MB/s 
Collecting threadpoolctl>=2.0.0
  Downloading threadpoolctl-3.1.0-py3-none-any.whl (14 kB)
Using legacy 'setup.py install' for camel-tools, since package 'wheel' is not installed.
Using legacy 'setup.py install' for camel-kenlm, since package 'wheel' is not installed.
Using legacy 'setup.py install' for docopt, since package 'wheel' is not installed.
Using legacy 'setup.py install' for future, since package 'wheel' is not installed.
Installing collected packages: urllib3, pyparsing, idna, charset-normalizer, certifi, typing-extensions, tqdm, six, requests, regex, pyyaml, packaging, numpy, joblib, filelock, click, tokenizers, threadpoolctl, scipy, sacremoses, pytz, python-dateutil, huggingface-hub, transformers, torch, scikit-learn, pandas, future, editdistance, docopt, dill, camel-kenlm, cachetools, camel-tools
    Running setup.py install for future ... done
    Running setup.py install for docopt ... done
    Running setup.py install for camel-kenlm ... done
    Running setup.py install for camel-tools ... done
Successfully installed cachetools-5.0.0 camel-kenlm-2021.12.27 camel-tools-1.2.0 certifi-2021.10.8 charset-normalizer-2.0.12 click-8.0.4 dill-0.3.4 docopt-0.6.2 editdistance-0.6.0 filelock-3.6.0 future-0.18.2 huggingface-hub-0.4.0 idna-3.3 joblib-1.1.0 numpy-1.22.2 packaging-21.3 pandas-1.4.1 pyparsing-3.0.7 python-dateutil-2.8.2 pytz-2021.3 pyyaml-6.0 regex-2022.3.2 requests-2.27.1 sacremoses-0.0.47 scikit-learn-1.0.2 scipy-1.8.0 six-1.16.0 threadpoolctl-3.1.0 tokenizers-0.11.6 torch-1.10.2 tqdm-4.63.0 transformers-4.17.0 typing-extensions-4.1.1 urllib3-1.26.8

Expected behavior
Camel-tools should install via pip as outlined in package documentation, but it does not. Currently, package setup.py specifies python_requires='>=3.6.0', but python versions >= 3.10.0 fail on installation as demonstrated above.

Desktop (please complete the following information):

OS: macOS Monterey Version 12.2.1 (21D62)
Hardware: MacBook Pro (14-inch, 2021); chip: Apple M1 Pro
Python versions: 3.10.2, 3.9.10
camel-tools default/latest version via pypi (camel_tools-1.1.0.tar.gz)

Additional context
n/a

error on Installations

I try to install CAMel tools and I get this error. I am using windows 10 and python 3
and use this comment for installation
pip install camel-tools -f https://download.pytorch.org/whl/torch_stable.html

Describe the backoff modes in the Analyzer

In the morphological analyzer section in the documentation, can a description for each one of the backoff modes (i.e. NOAN PROP) be provided (if not already present somewhere)?

[QUESTION] Error while loading sentiment_analysis/arabert model

Greetings,
It seems that there's a similar issue that has been closed before, but unfortunately the suggested solutions didn't work in my case.
When trying to load AraBERT model:


sentences = [
    'أنا بخير',
    'ابشرك'
]
sentiments = sa.predict(sentences)```

**I get the following error while:**

OSError: Can't load config for '/Users/.../.camel_tools/data/sentiment_analysis/arabert'. Make sure that:

- '/Users/.../.camel_tools/data/sentiment_analysis/arabert' is a correct model identifier listed on 'https://huggingface.co/models'

- or '/Users/.../.camel_tools/data/sentiment_analysis/arabert' is the correct path to a directory containing a config.json file

Your assistance is greatly appreciated, thanks in advance!

AppData\Roaming\camel_tools\data\data\sentiment_analysis\arabert is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'[QUESTION] Title of question...

Hello, I am trying to test sentiment analysis tools using CAMeL. I tried to implement the following code:
`from camel_tools.sentiment import SentimentAnalyzer

sa = SentimentAnalyzer.pretrained()

sentences = [
'أنا بخير',
'أنا لست بخير'
]

sentiments = sa.predict(sentences)

print(sentiments)`

Where this error kept occurring
OSError: C:\Users\s.alzubi\AppData\Roaming\camel_tools\data\data\sentiment_analysis\arabert is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo with use_auth_token or log in with huggingface-cli login and pass use_auth_token=True.

I spent a week solving this problem, tried to create a username on Huggingface and generated an access token to execute the code. Then I got stuck with this and couldn't do anything.
Can anyone help me with this?
thank you

[QUESTION] Information on NER component

Describe what you would like to know about CAMeL Tools.

Hello, I wanted to know if you could provide some information regarding the NER component of the library.

In the catalog JSON file, you mention that you are using a finetuned AraBERT model, with the specified version being 1.0.0. So from here, I wanted to know:

whether the model used as base was indeed AraBERTv1 from this repo ?
which dataset you used ?
whether you used the FARASA preprocessing for the finetuning or your own given that they used the former for pretraining ?

I ask because while doing some research I saw that your lab has produced multiple arabic BERT models, which have the benefit of:

having used the camel_tools preprocessing rather the FARASA for both pretraining and finetuning
have dialect-specific variants, which may be interesting in some cases
seem to outperform the AraBERTv1 on NER tasks according to your paper

I was wondering whether you would consider making these models available for use in this library ? I know you have released the code and pretrained model, and I am planning on experimenting with this, but thought it would be a nice addition.

[ENHANCEMENT] Title of enhancement request...

What feature would you like to improve?
Provide the name of the class, function, module, or command-line tool in question.

Is your enhancement request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

[QUESTION] PoS feature Enhancement request

Describe what you would like to know about CAMeL Tools.
PoS feature Enhancement request, especially for foreign words tag.
When i using tagger function with non Arabic words it always tag it as 'noun'. Based on your guide it must tag as 'FORIEGN'

Iam using this code:
`from camel_tools.tokenizers.word import simple_word_tokenize
from camel_tools.disambig.mle import MLEDisambiguator
from camel_tools.tagger.default import DefaultTagger

mle = MLEDisambiguator.pretrained()
tagger = DefaultTagger(mle, 'pos')

sentence = simple_word_tokenize(' HTML استخدم')

pos_tags = tagger.tag(sentence)

print(pos_tags)`

My question is were the problem? in my code or this feature need enhancement?

How can you do the tokenization?

I have been reading the code and I'm not sure where to go from here.

db = camel_tools.calima_star.database.CalimaStarDB.builtin_db()
analyzer = camel_tools.calima_star.analyzer.CalimaStarAnalyzer(db)
disambiguator = camel_tools.disambig.mle.MLEDisambiguator(analyzer)
morph = MorphologicalTokenizer(disambiguator)
morhph.tokenize('ذهب الولد')

RESULT
This gives me ['ذ', 'ه', 'بِ', ' ', 'أَ', 'لِ', 'وَ', 'لِ', 'د', ' ']

I'm not sure what am I doing wrong and would appreciate some guidance.

Also when I go to this page https://camel-tools.readthedocs.io/en/v0.3.dev0/reference/calima_star_features.html to check what is the tokenization scheme used, I can't find "The tokenization scheme to use. Defaults to 'atbtok'." so I am not sure what atbtok means here?

camel_tools.ner not found

Hi All,

I am trying to use camel_tools for ner and i git the error camel_tools.ner not found.

Thanks,
Akash

camel_tools.morphology.errors

Describe the bug
When I try Morphological analysis example that exists in (CAMeL_Tools_Guided_Tour.ipynb) I got this bug.
To Reproduce
I copied the example code from the guide and just run the python file.
Expected behavior
I expected to get the morphological analysis output

Screenshots
If applicable, add scre

enshots to help explain your problem.
Preferably, attach error logs in code blocks.

Desktop (please complete the following information):

OS [Ubuntu 20.04.2 LTS ]
Python 3.8.5
CAMeL Tools latest version and all data downloaded using pip3
i downloaded camel data using (camel_data full) command

camel_tools.morphology.errors.DatabaseParseError: Error parsing database (invalid ORDER line '###STEMBACKOFF###')[BUG] Title of new issue...

Describe the bug
When I try morphology example from docs I got this bug
To Reproduce
I copied the example code and just run the python file.

Expected behavior
I expected to get the morphological analysis output
Screenshots

Desktop (please complete the following information):

Windows 10
Python 3.7
CAMeL Tools latest version and all data downloaded

[BUG] Title of new issue...

I believe there is a bug with the DialectIdentifier.train(). I have downloaded the large data set and added it to the path. However, whenever I try and use the train() method, this error comes up:

No such file or directory: '/home/usr/.camel_tools/data/dialectid/default/corpus_26_train.tsv'

The code I use to produce this error is:

from camel_tools.dialectid import DialectIdentifier

model = DialectIdentifier().train()

I use:

Ubuntu 20
Python 3.8
camel-tools 1.0.1

dediacritizing Arabic text question

I want to remove tanween Fath + Alif , In the following code

sentence_ar = 'اشتريت كتبًا كثيرةً'

sentence_ar_dediac = dediac_ar(sentence_ar)

print(sentence_ar_dediac))

the output is
اشتريت كتبا كثيرة
Is there a way to remove Alif so the output will be
اشتريت كتب كثيرة

I want to apply this feature to many words in my dataset.

[ENHANCEMENT] Return all analyses for MLEDisambiguator

What feature would you like to improve?
The MLEDisambiguator.

Describe the solution you'd like
I would like to be able to give top=-1 to return all analyses.

Describe alternatives you've considered
I'm currently typing in top=1000.

camel-lab / camel_tools Goto Github PK

camel_tools's Introduction

CAMeL Tools

Introduction

Installation

Linux/macOS

Install using pip

Install from source

Installing data

Windows

Install using pip

Install from source

Installing data

Documentation

Citation

License

Contribute

Contributors

camel_tools's People

Contributors

Stargazers

Watchers

Forkers

camel_tools's Issues

I keep getting the following error:

I followed what's in these links:

My Code:

Recommend Projects

Recommend Topics

Recommend Org