text-machine-lab / cliner Goto Github PK

This project forked from mit-medg/clicon

Clinical Named Entity Recognition system (CliNER) is an open-source natural language processing system for named entity recognition in clinical text of electronic health records.

Home Page: http://text-machine.cs.uml.edu/cliner/

Python 94.91% Perl 5.09%

cliner's People

Contributors

Stargazers

Watchers

Forkers

aussina renan-campos wboag navd medevaknowledgesystems drstatsvenu coloratto rafspiny alexgarciac simius sp2014 yofayed namehta qichenglao saadjanjua mbencherif elensergwork erayon kirkhadley indera yugrocks arnaudmkonan navyavats14 hanfeijp henghuiz-zz mihirchakradeo gkovaig madhumitasushil ssbpv lil-x-rui cidawkins ravikiran0606 akemisetti kormilitzin dsouzadaniel jmoseyko qiangzhongwork niventhini harrypotter0 sbhttcha sandy4321 jcassiojr fendaq baifengbai gokulrameshd liangdu viggyfresh mdp0999 tatheerhussain qurius-inc rglegge2 munaachyuta mkim0710 lakshya0002 correlator unosonu srngit augmen senrabc xraycat123 vishalbelsare codeaudit waghsk niksart yimingli90 sundeeppidugu jaiprasadreddy zhangxingyaothu shiva16 alekhya-singh javadavid nandhiniddsar anujamajmundar ashu-holmes nikita555-89 bs3537 dragomirradev intuitionmachine ryannetwork sankaran45 hafsah2018 nzahar rushabh31 ronghuizhou hannanabdul55 oxymal jidiazhernandez nlptechx ha-lins shuvoxcd01 zhangbeibei1991 gideon94 sragh wanzhow vindamle odnodn aasha01 habibmrad tianforks nivedita0701

cliner's Issues

Unable to predict using LSTM model

There are SEVERAL issues while predicting output using LSTM model. Can anyone help??
Which def generic_predict should we use? While commented params should we uncomment? Why is there a list out of index issue?
If this work is accepted then why is the author not responding? Very disappointing !!!

vector2.txt in LSTM_parameters.txt is not found

When training with LSTM, I got the following error:
`(base) yue@tredgar:~/python_projects/CliNER$ python cliner train --txt data/i2b2_txt --annotations data/i2b2_concept --format i2b2 --model models/i2b2_lstm.model --use-lstm
NO DEV
IN ERROR CHECK
0.1
Creating 90/10 train/dev split
vectorizing words all
TESTING NEW DATSET OBJECT
Load dataset...

Traceback (most recent call last):
File "cliner", line 60, in
main()
File "cliner", line 49, in main
train.main()
File "/home/yue/python_projects/CliNER/code/train.py", line 158, in main
train(training_list, args.model, args.format, args.use_lstm, logfile=args.log, val=val_list, test=test_list)
File "/home/yue/python_projects/CliNER/code/train.py", line 194, in train
model.train(train_docs, val=val_docs, test=test_docs)
File "/home/yue/python_projects/CliNER/code/model.py", line 199, in train
test_sents=test_sents, test_labels=test_labels)
File "/home/yue/python_projects/CliNER/code/model.py", line 253, in train_fit
dev_split=dev_split )
File "/home/yue/python_projects/CliNER/code/model.py", line 410, in generic_train
dataset.load_dataset(Datasets_tokens,Datasets_labels,"",parameters)
File "/home/yue/python_projects/CliNER/code/DatasetCliner_experimental.py", line 210, in load_dataset
token_to_vector = hd.load_pretrained_token_embeddings(parameters)
File "/home/yue/python_projects/CliNER/code/helper_dataset.py", line 210, in load_pretrained_token_embeddings
file_input = codecs.open(parameters['token_pretrained_embedding_filepath'], 'r', 'UTF-8')
File "/home/yue/anaconda3/lib/python3.7/codecs.py", line 898, in open
file = builtins.open(filename, mode, buffering)
FileNotFoundError: [Errno 2] No such file or directory: 'vectors2.txt'
`
It seems that in LSTM_parameters.txt, the vectors2.txt on the first line is missing. Is this supposed to be the GloVe vectors?

unable to run

python cliner predict --txt data/examples/ex_doc.txt --out data/predictions --model models/silver.crf --format i2b2
Traceback (most recent call last):
File "cliner", line 60, in
main()
File "cliner", line 52, in main
predict.main()
File "/home/caprice/serverless-nodejs-app/CliNER/code/predict.py", line 79, in main
predict(files, args.model, args.output, format=format)
File "/home/caprice/serverless-nodejs-app/CliNER/code/predict.py", line 96, in predict
model = pickle.load(f)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf4 in position 6: ordinal not in range(128

error when trying to dump the model into tmp file

Hi, thanks for this beautiful project. I am trying to run this code but have an issue: no such file or directory "/tmp\tmp20_08sb3crf_temp" when executing crf.py, line 176, in predict:
os_handle,tmp_file = tempfile.mkstemp(dir='/tmp', suffix="crf_temp")

thanks for your help

Directly interacting with code examples

I was wondering if some getting started examples could be added that would directly call functions instead of running modules from the command line. Also, it would be helpful to know about the general design philosophy if it is intended to have people call from the command line. I think it was Rasa which made the decision when it went to v 1.0 to go from programmatic access to command-line only access making it very hard to call anything from within one's own code.

format.py not working

I would like to use the format.py script to convert i2b2 files to xml format.
I think that this script still needs to be changed. It is importing notes.Note which doesn't exist in this repo.

Python 3 compatibility issue -- pickle.load(), encoding argument

Line 96 in predict.py:
model = pickle.load(f,encoding = 'latin1')

Error:
TypeError: load() got an unexpected keyword argument 'encoding'

Temporary work-around:
For Python 3, eliminate encoding = [] arg entirely. Documentation specifies that the optional args are only relevant for Python 2.x.

Cliner Training foo.model issue

Hello, I have been trying to run the example in Google Colab and am having trouble with making foo.model with this code that is provided.

python cliner train --txt data/examples/ex_doc.txt --annotations data/examples/ex_doc.con --format i2b2 --model models/foo.model

When I run this, I got the following message:
NO DEV
Creating 90/10 train/dev split
vectorizing words all
training classifiers all
Traceback (most recent call last):
File "cliner", line 60, in
main()
File "cliner", line 49, in main
train.main()
File "/content/gdrive/My Drive/Colab Notebooks/CliNER/code/train.py", line 155, in main
train(training_list, args.model, args.format, args.use_lstm, logfile=args.log, val=val_list, test=test_list)
File "/content/gdrive/My Drive/Colab Notebooks/CliNER/code/train.py", line 187, in train
model.train(train_docs, val=val_docs, test=test_docs)
File "/content/gdrive/My Drive/Colab Notebooks/CliNER/code/model.py", line 199, in train
test_sents=test_sents, test_labels=test_labels)
File "/content/gdrive/My Drive/Colab Notebooks/CliNER/code/model.py", line 230, in train_fit
dev_split=dev_split )
File "/content/gdrive/My Drive/Colab Notebooks/CliNER/code/model.py", line 584, in generic_train
test_X=test_X, test_Y=test_Y)
File "/content/gdrive/My Drive/Colab Notebooks/CliNER/code/machine_learning/crf.py", line 149, in train
train_pred = predict(model, X)
File "/content/gdrive/My Drive/Colab Notebooks/CliNER/code/machine_learning/crf.py", line 181, in predict
clf_byte = bytearray(clf, 'latin1')
TypeError: encoding or errors without a string argument

How do I fix this? Thank you

cliner: command not found

How do you install cliner on the system?
I have followed all the steps listed in the README of this project.

evaluate not working

When running evaluate.py (official i2b2 evaluation script), I get the following error:

The Paramaters You Gave Were Invalid.
Valid Paramaters Are As Follows:
Default -> If only given -rcp and -scp everything else will be set to all and batch.
-rcp Classpath of the reference directory (String) *REQUIRED*
-scp Classpath of the system directory (String) *REQUIRED*
-ft all (for ratio tests on .ast,.rel and .con files)
-ft ast (for ratio tests on .ast files only)
-ft rel (for ratio tests on .rel files only)
-ft con (for ratio tests on .con files only)
-ex all (for both exact an inexact ratio tests)
-ex exact (for exact ratio tests only)
-ex inexact (for inexact ratio tests only)
-ag all (for both aggregated and seperate ratio tests)
-ag seperate (for seperate ratio tests only)
-ag aggregated (for aggregated ratio tests only)
-ba batch (testing in a batch format)
-ba perfile (testing for each file)
Your Given System Filepath is Invalid.
The Paramaters You Gave Were Invalid.
Valid Paramaters Are As Follows:
Default -> If only given -rcp and -scp everything else will be set to all and batch.
-rcp Classpath of the reference directory (String) *REQUIRED*
-scp Classpath of the system directory (String) *REQUIRED*
-ft all (for ratio tests on .ast,.rel and .con files)
-ft ast (for ratio tests on .ast files only)
-ft rel (for ratio tests on .rel files only)
-ft con (for ratio tests on .con files only)
-ex all (for both exact an inexact ratio tests)
-ex exact (for exact ratio tests only)
-ex inexact (for inexact ratio tests only)
-ag all (for both aggregated and seperate ratio tests)
-ag seperate (for seperate ratio tests only)
-ag aggregated (for aggregated ratio tests only)
-ba batch (testing in a batch format)
-ba perfile (testing for each file)

Unable to run lstm model successfully

Below is the command I am trying:

$ python cliner train --txt "data/i2b2/.txt" --annotations "data/i2b2/.con" --format i2b2 --model models/i2b2_lstm.model --use-lstm

Here is the trace:

NO DEV
IN ERROR CHECK
0.1
Creating 90/10 train/dev split
vectorizing words all
TESTING NEW DATSET OBJECT
Traceback (most recent call last):
File "cliner", line 60, in
main()
File "cliner", line 49, in main
train.main()
File "/Users/ag33366/CliNER/code/train.py", line 155, in main
train(training_list, args.model, args.format, args.use_lstm, logfile=args.log, val=val_list, test=test_list)
File "/Users/ag33366/CliNER/code/train.py", line 187, in train
model.train(train_docs, val=val_docs, test=test_docs)
File "/Users/ag33366/CliNER/code/model.py", line 199, in train
test_sents=test_sents, test_labels=test_labels)
File "/Users/ag33366/CliNER/code/model.py", line 253, in train_fit
dev_split=dev_split )
File "/Users/ag33366/CliNER/code/model.py", line 382, in generic_train
dataset = Exp.Dataset()
NameError: name 'Exp' is not defined

I am able to run the crf model successfully but having issues with LSTM. Could you please reply as soon as possible. Thank you.

cliner command not found, after using all the steps from README.

Comparison to word "blood" hardcoded in get_cui

Could this line be a leftover debug statement?

https://github.com/text-machine-lab/CliNER/blob/master/code/feature_extraction/umls_dir/interpret_umls.py#L192

Tokenization question

If we want to use the out-of-the-box silver model to do prediction, what tokenization scheme is expected? NLTK word_tokenize? Or should I use the tools/tok.py file?

UMLS: package utilities isn't available any more

In order to use the UMLS subsets, after the database is build I get an error that the module utilities can't be loaded. But I can't install it, cause it seems abandoned. I have a link to a stackoverflow post:
https://stackoverflow.com/questions/44143332/issues-while-installing-utilities-module-in-python

As you can see the official page for the package, also doesn't show that it nether can be installed through pip nor download the source files. Link below.
https://pypi.org/project/utilities/

This situation is really weird cause even if you take a package out of the repository you have to provide an alternative. But I didn't see an alternative when browsing through similar packages.

It would be nice if anyone take on this issue and make UMLS possible with CliNER again.

UMLS utility

The predictions are same irrespective of whether UMLS tables are used or not.
I don't know if we need to do anything else other than

download UMLS tables into a $CLINER/umls_tables directory
change the config.txt file in $CLINER
UMLS $CLINER/umls_tables instead of UMLS None

If anybody can share how to fix this issue it will be really helpful!
Thank you in advance!!!

Cliner is not recognized

I have followed all the steps in the readme but after installation, I am getting ''cliner' is not recognized as an internal or external command. How can i use the cliner? Also, cliner is unrecognizable format. Is there any way i can use the cliner tool? Please provide the detailed approach

LSTM

Is there pre-trained LSTM model, and how can I predict using LSTM.