Giter Club home page Giter Club logo

cliner's People

Contributors

arumshisky avatar c-cooper avatar correlator avatar dsouzadaniel avatar elesideprojects avatar kwaco avatar navd avatar salmanahmad avatar shuvoxcd01 avatar tnaumann avatar wboag avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cliner's Issues

Unable to predict using LSTM model

There are SEVERAL issues while predicting output using LSTM model. Can anyone help??
Which def generic_predict should we use? While commented params should we uncomment? Why is there a list out of index issue?
If this work is accepted then why is the author not responding? Very disappointing !!!

vector2.txt in LSTM_parameters.txt is not found

When training with LSTM, I got the following error:
`(base) yue@tredgar:~/python_projects/CliNER$ python cliner train --txt data/i2b2_txt --annotations data/i2b2_concept --format i2b2 --model models/i2b2_lstm.model --use-lstm
NO DEV
IN ERROR CHECK
0.1
Creating 90/10 train/dev split
vectorizing words all
TESTING NEW DATSET OBJECT
Load dataset...

Traceback (most recent call last):
File "cliner", line 60, in
main()
File "cliner", line 49, in main
train.main()
File "/home/yue/python_projects/CliNER/code/train.py", line 158, in main
train(training_list, args.model, args.format, args.use_lstm, logfile=args.log, val=val_list, test=test_list)
File "/home/yue/python_projects/CliNER/code/train.py", line 194, in train
model.train(train_docs, val=val_docs, test=test_docs)
File "/home/yue/python_projects/CliNER/code/model.py", line 199, in train
test_sents=test_sents, test_labels=test_labels)
File "/home/yue/python_projects/CliNER/code/model.py", line 253, in train_fit
dev_split=dev_split )
File "/home/yue/python_projects/CliNER/code/model.py", line 410, in generic_train
dataset.load_dataset(Datasets_tokens,Datasets_labels,"",parameters)
File "/home/yue/python_projects/CliNER/code/DatasetCliner_experimental.py", line 210, in load_dataset
token_to_vector = hd.load_pretrained_token_embeddings(parameters)
File "/home/yue/python_projects/CliNER/code/helper_dataset.py", line 210, in load_pretrained_token_embeddings
file_input = codecs.open(parameters['token_pretrained_embedding_filepath'], 'r', 'UTF-8')
File "/home/yue/anaconda3/lib/python3.7/codecs.py", line 898, in open
file = builtins.open(filename, mode, buffering)
FileNotFoundError: [Errno 2] No such file or directory: 'vectors2.txt'
`
It seems that in LSTM_parameters.txt, the vectors2.txt on the first line is missing. Is this supposed to be the GloVe vectors?

unable to run

python cliner predict --txt data/examples/ex_doc.txt --out data/predictions --model models/silver.crf --format i2b2
Traceback (most recent call last):
File "cliner", line 60, in
main()
File "cliner", line 52, in main
predict.main()
File "/home/caprice/serverless-nodejs-app/CliNER/code/predict.py", line 79, in main
predict(files, args.model, args.output, format=format)
File "/home/caprice/serverless-nodejs-app/CliNER/code/predict.py", line 96, in predict
model = pickle.load(f)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf4 in position 6: ordinal not in range(128

error when trying to dump the model into tmp file

Hi, thanks for this beautiful project. I am trying to run this code but have an issue: no such file or directory "/tmp\tmp20_08sb3crf_temp" when executing crf.py, line 176, in predict:
os_handle,tmp_file = tempfile.mkstemp(dir='/tmp', suffix="crf_temp")

thanks for your help

Directly interacting with code examples

I was wondering if some getting started examples could be added that would directly call functions instead of running modules from the command line. Also, it would be helpful to know about the general design philosophy if it is intended to have people call from the command line. I think it was Rasa which made the decision when it went to v 1.0 to go from programmatic access to command-line only access making it very hard to call anything from within one's own code.

format.py not working

I would like to use the format.py script to convert i2b2 files to xml format.
I think that this script still needs to be changed. It is importing notes.Note which doesn't exist in this repo.

Cliner Training foo.model issue

Hello, I have been trying to run the example in Google Colab and am having trouble with making foo.model with this code that is provided.

python cliner train --txt data/examples/ex_doc.txt --annotations data/examples/ex_doc.con --format i2b2 --model models/foo.model

When I run this, I got the following message:
NO DEV
Creating 90/10 train/dev split
vectorizing words all
training classifiers all
Traceback (most recent call last):
File "cliner", line 60, in
main()
File "cliner", line 49, in main
train.main()
File "/content/gdrive/My Drive/Colab Notebooks/CliNER/code/train.py", line 155, in main
train(training_list, args.model, args.format, args.use_lstm, logfile=args.log, val=val_list, test=test_list)
File "/content/gdrive/My Drive/Colab Notebooks/CliNER/code/train.py", line 187, in train
model.train(train_docs, val=val_docs, test=test_docs)
File "/content/gdrive/My Drive/Colab Notebooks/CliNER/code/model.py", line 199, in train
test_sents=test_sents, test_labels=test_labels)
File "/content/gdrive/My Drive/Colab Notebooks/CliNER/code/model.py", line 230, in train_fit
dev_split=dev_split )
File "/content/gdrive/My Drive/Colab Notebooks/CliNER/code/model.py", line 584, in generic_train
test_X=test_X, test_Y=test_Y)
File "/content/gdrive/My Drive/Colab Notebooks/CliNER/code/machine_learning/crf.py", line 149, in train
train_pred = predict(model, X)
File "/content/gdrive/My Drive/Colab Notebooks/CliNER/code/machine_learning/crf.py", line 181, in predict
clf_byte = bytearray(clf, 'latin1')
TypeError: encoding or errors without a string argument

How do I fix this? Thank you

cliner: command not found

How do you install cliner on the system?
I have followed all the steps listed in the README of this project.

evaluate not working

When running evaluate.py (official i2b2 evaluation script), I get the following error:

The Paramaters You Gave Were Invalid.
Valid Paramaters Are As Follows:
Default -> If only given -rcp and -scp everything else will be set to all and batch.
-rcp Classpath of the reference directory (String) *REQUIRED*
-scp Classpath of the system directory (String) *REQUIRED*
-ft all (for ratio tests on .ast,.rel and .con files)
-ft ast (for ratio tests on .ast files only)
-ft rel (for ratio tests on .rel files only)
-ft con (for ratio tests on .con files only)
-ex all (for both exact an inexact ratio tests)
-ex exact (for exact ratio tests only)
-ex inexact (for inexact ratio tests only)
-ag all (for both aggregated and seperate ratio tests)
-ag seperate (for seperate ratio tests only)
-ag aggregated (for aggregated ratio tests only)
-ba batch (testing in a batch format)
-ba perfile (testing for each file)
Your Given System Filepath is Invalid.
The Paramaters You Gave Were Invalid.
Valid Paramaters Are As Follows:
Default -> If only given -rcp and -scp everything else will be set to all and batch.
-rcp Classpath of the reference directory (String) *REQUIRED*
-scp Classpath of the system directory (String) *REQUIRED*
-ft all (for ratio tests on .ast,.rel and .con files)
-ft ast (for ratio tests on .ast files only)
-ft rel (for ratio tests on .rel files only)
-ft con (for ratio tests on .con files only)
-ex all (for both exact an inexact ratio tests)
-ex exact (for exact ratio tests only)
-ex inexact (for inexact ratio tests only)
-ag all (for both aggregated and seperate ratio tests)
-ag seperate (for seperate ratio tests only)
-ag aggregated (for aggregated ratio tests only)
-ba batch (testing in a batch format)
-ba perfile (testing for each file)

Unable to run lstm model successfully

Below is the command I am trying:

$ python cliner train --txt "data/i2b2/.txt" --annotations "data/i2b2/.con" --format i2b2 --model models/i2b2_lstm.model --use-lstm

Here is the trace:

NO DEV
IN ERROR CHECK
0.1
Creating 90/10 train/dev split
vectorizing words all
TESTING NEW DATSET OBJECT
Traceback (most recent call last):
File "cliner", line 60, in
main()
File "cliner", line 49, in main
train.main()
File "/Users/ag33366/CliNER/code/train.py", line 155, in main
train(training_list, args.model, args.format, args.use_lstm, logfile=args.log, val=val_list, test=test_list)
File "/Users/ag33366/CliNER/code/train.py", line 187, in train
model.train(train_docs, val=val_docs, test=test_docs)
File "/Users/ag33366/CliNER/code/model.py", line 199, in train
test_sents=test_sents, test_labels=test_labels)
File "/Users/ag33366/CliNER/code/model.py", line 253, in train_fit
dev_split=dev_split )
File "/Users/ag33366/CliNER/code/model.py", line 382, in generic_train
dataset = Exp.Dataset()
NameError: name 'Exp' is not defined

I am able to run the crf model successfully but having issues with LSTM. Could you please reply as soon as possible. Thank you.

Tokenization question

If we want to use the out-of-the-box silver model to do prediction, what tokenization scheme is expected? NLTK word_tokenize? Or should I use the tools/tok.py file?

UMLS: package utilities isn't available any more

In order to use the UMLS subsets, after the database is build I get an error that the module utilities can't be loaded. But I can't install it, cause it seems abandoned. I have a link to a stackoverflow post:
https://stackoverflow.com/questions/44143332/issues-while-installing-utilities-module-in-python

As you can see the official page for the package, also doesn't show that it nether can be installed through pip nor download the source files. Link below.
https://pypi.org/project/utilities/

This situation is really weird cause even if you take a package out of the repository you have to provide an alternative. But I didn't see an alternative when browsing through similar packages.

It would be nice if anyone take on this issue and make UMLS possible with CliNER again.

UMLS utility

The predictions are same irrespective of whether UMLS tables are used or not.
I don't know if we need to do anything else other than

  1. download UMLS tables into a $CLINER/umls_tables directory
  2. change the config.txt file in $CLINER
    UMLS $CLINER/umls_tables instead of UMLS None

If anybody can share how to fix this issue it will be really helpful!
Thank you in advance!!!

Cliner is not recognized

I have followed all the steps in the readme but after installation, I am getting ''cliner' is not recognized as an internal or external command. How can i use the cliner? Also, cliner is unrecognizable format. Is there any way i can use the cliner tool? Please provide the detailed approach

LSTM

Is there pre-trained LSTM model, and how can I predict using LSTM.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.