Giter Club home page Giter Club logo

linkedbooksdeepreferenceparsing's People

Contributors

giovanni1085 avatar ra-danny avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

linkedbooksdeepreferenceparsing's Issues

IndexError: list index out of range

When running python main_threeTasks.py (from ./crf_baseline) I get the following error:

/home/matthew/Documents/wellcome/LinkedBooksDeepReferenceParsing/crf_baseline/build/virtualenv/lib/python3.7/site-packages/sklearn/externals/joblib/__init__.py:15: DeprecationWarning: sklearn.externals.joblib is deprecated in 0.21 and will be removed in 0.23. Please import this functionality directly from joblib, which can be installed with: pip install joblib. If this warning is raised when loading pickled models, you may need to re-serialize those models with scikit-learn 0.21+.
  warnings.warn(msg, category=DeprecationWarning)
Traceback (most recent call last):
  File "main_threeTasks.py", line 22, in <module>
    X_train_w, train_t1, train_t2, train_t3 = load_data("../dataset/clean_train.txt")
  File "/home/matthew/Documents/wellcome/LinkedBooksDeepReferenceParsing/crf_baseline/code/utils.py", line 50, in load_data
    tags4.append(w[4])
IndexError: list index out of range

Tokeniser

Hi @Giovanni1085 do you have any information about the tokeniser that was used on the training texts?

Foldering

Excellent code Danny!

Can you please put all the Keras code into a keras/ folder, and add a README there with the details on how to use it?

The general structure we will have is:
data/
keras/
tensorflow/
...

Within each a README with details. The word embeddings I will store in a separate location.

Thanks

Update TODO list

Remember to update the TODO list by striking through something done and adding/editing as needed.

Incomplete list of dependencies

Hi @Giovanni1085 many thanks for publishing this code, and your very useful paper.

unfortunately I'm having several issues getting it run on my machine. I suspect much of it is caused by not having a complete list of dependencies. Would you consider adding a more comprehensive list in a requirements.txt?

I'll document the issues I'm having in some other issues.

AttributeError: 'Model' object has no attribute 'output_layers'

When running python keras/main_multiTaskLearning.py I run into AttributeError: 'Model' object has no attribute 'output_layers'.

Full traceback included below:

(virtualenv)  matthew@xps15  ~/Documents/wellcome/LinkedBooksDeepReferenceParsing   master ●  python keras/main_multiTaskLearning.py
WARNING: Logging before flag parsing goes to stderr.
W0720 20:58:02.091192 140559446904960 deprecation_wrapper.py:119] From keras/main_multiTaskLearning.py:6: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead.

Using TensorFlow backend.
Number of  entries:  828394
Individual entries:  57099
Number of labels:  27
Number of labels:  10
Number of labels:  4
{1: 'abbreviation', 2: 'archivalreference', 3: 'archive_lib', 4: 'attachment', 5: 'author', 6: 'box', 7: 'cartulation', 8: 'column', 9: 'conjunction', 10: 'date', 11: 'filza', 12: 'folder', 13: 'foliation', 14: 'numbered_ref', 15: 'o', 16: 'pagination', 17: 'publicationnumber-year', 18: 'publicationplace', 19: 'publicationspecifications', 20: 'publisher', 21: 'ref', 22: 'registry', 23: 'series', 24: 'title', 25: 'tomo', 26: 'volume', 27: 'year'}
{1: 'b-meta-annotation', 2: 'b-primary', 3: 'b-secondary', 4: 'e-meta-annotation', 5: 'e-primary', 6: 'e-secondary', 7: 'i-meta-annotation', 8: 'i-primary', 9: 'i-secondary', 10: 'o'}
{1: 'b-r', 2: 'e-r', 3: 'i-r', 4: 'o'}
Maximum sequence length - general : 73
Maximum sequence length - data    : 73
Maximum sequence length - general : 73
Maximum sequence length - data    : 30
Maximum sequence length - general : 73
Maximum sequence length - data    : 35
Maximum sequence length - labels : 73
Maximum sequence length - labels : 73
Maximum sequence length - labels : 73
Maximum sequence length - labels : 73
Maximum sequence length - labels : 73
Maximum sequence length - labels : 73
Maximum sequence length - labels : 73
Maximum sequence length - labels : 73
Maximum sequence length - labels : 73
Maximum number of words in a sequence  : 73
Maximum number of characters in a word : 54
====== multi_task start ======
W0720 20:58:23.178692 140559446904960 deprecation_wrapper.py:119] From /home/matthew/Documents/wellcome/LinkedBooksDeepReferenceParsing/build/virtualenv/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:74: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

W0720 20:58:23.178883 140559446904960 deprecation_wrapper.py:119] From /home/matthew/Documents/wellcome/LinkedBooksDeepReferenceParsing/build/virtualenv/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

W0720 20:58:48.890198 140559446904960 deprecation_wrapper.py:119] From /home/matthew/Documents/wellcome/LinkedBooksDeepReferenceParsing/build/virtualenv/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

W0720 20:58:48.894599 140559446904960 deprecation_wrapper.py:119] From /home/matthew/Documents/wellcome/LinkedBooksDeepReferenceParsing/build/virtualenv/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:174: The name tf.get_default_session is deprecated. Please use tf.compat.v1.get_default_session instead.

2019-07-20 20:58:48.894895: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-07-20 20:58:48.915630: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2904000000 Hz
2019-07-20 20:58:48.916541: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55eca50c1120 executing computations on platform Host. Devices:
2019-07-20 20:58:48.916563: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
2019-07-20 20:58:48.925211: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set.  If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU.  To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
W0720 20:58:49.350232 140559446904960 deprecation.py:506] From /home/matthew/Documents/wellcome/LinkedBooksDeepReferenceParsing/build/virtualenv/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:3445: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
/home/matthew/Documents/wellcome/LinkedBooksDeepReferenceParsing/build/virtualenv/lib/python3.7/site-packages/keras_contrib-2.0.8-py3.7.egg/keras_contrib/layers/crf.py:346: UserWarning: CRF.loss_function is deprecated and it might be removed in the future. Please use losses.crf_loss instead.
/home/matthew/Documents/wellcome/LinkedBooksDeepReferenceParsing/build/virtualenv/lib/python3.7/site-packages/keras_contrib-2.0.8-py3.7.egg/keras_contrib/layers/crf.py:363: UserWarning: CRF.viterbi_acc is deprecated and it might be removed in the future. Please use metrics.viterbi_acc instead.
W0720 20:58:50.320957 140559446904960 deprecation_wrapper.py:119] From /home/matthew/Documents/wellcome/LinkedBooksDeepReferenceParsing/build/virtualenv/lib/python3.7/site-packages/keras/optimizers.py:790: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.

W0720 20:58:50.371632 140559446904960 deprecation.py:323] From /home/matthew/Documents/wellcome/LinkedBooksDeepReferenceParsing/build/virtualenv/lib/python3.7/site-packages/tensorflow/python/ops/math_ops.py:2403: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Traceback (most recent call last):
  File "keras/main_multiTaskLearning.py", line 87, in <module>
    gen_confusion_matrix=True, early_stopping_patience=5
  File "/home/matthew/Documents/wellcome/LinkedBooksDeepReferenceParsing/keras/code/models.py", line 177, in BiLSTM_model
    hist = model.fit(X_train, y_train, validation_data=[X_test, y_test], epochs=nbr_epochs, batch_size=batch_size, callbacks=callbacks, verbose=2)
  File "/home/matthew/Documents/wellcome/LinkedBooksDeepReferenceParsing/build/virtualenv/lib/python3.7/site-packages/keras/engine/training.py", line 1039, in fit
    validation_steps=validation_steps)
  File "/home/matthew/Documents/wellcome/LinkedBooksDeepReferenceParsing/build/virtualenv/lib/python3.7/site-packages/keras/engine/training_arrays.py", line 127, in fit_loop
    callbacks.on_train_begin()
  File "/home/matthew/Documents/wellcome/LinkedBooksDeepReferenceParsing/build/virtualenv/lib/python3.7/site-packages/keras/callbacks.py", line 132, in on_train_begin
    callback.on_train_begin(logs)
  File "/home/matthew/Documents/wellcome/LinkedBooksDeepReferenceParsing/keras/code/utils.py", line 381, in on_train_begin
    if len(self.model.output_layers) > 1:
AttributeError: 'Model' object has no attribute 'output_layers'

I'm running python 3.7.0 with the following package versions:

absl-py==0.7.1
astor==0.8.0
cycler==0.10.0
gast==0.2.2
google-pasta==0.1.7
grpcio==1.22.0
h5py==2.9.0
joblib==0.13.2
Keras==2.2.4
Keras-Applications==1.0.8
keras-contrib==2.0.8
Keras-Preprocessing==1.1.0
kiwisolver==1.1.0
Markdown==3.1.1
matplotlib==3.1.1
numpy==1.16.4
protobuf==3.9.0
pyparsing==2.4.0
python-crfsuite==0.9.6
python-dateutil==2.8.0
PyYAML==5.1.1
scikit-learn==0.21.2
scipy==1.3.0
six==1.12.0
sklearn-crfsuite==0.3.6
tabulate==0.8.3
tensorboard==1.14.0
tensorflow==1.14.0
tensorflow-estimator==1.14.0
termcolor==1.1.0
tqdm==4.32.2
Werkzeug==0.15.5
wrapt==1.11.2

Note that my versions do not match what are required in the README.md, however I have tried installing scikit-learn==0.19.1 (for example) and immediately run into installation issues due to version incompatibilities, which I think would be mostly solved with a requirements.txt hence #3.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.