Comments (8)
Maybe you cut your off your data at a wrong point?
Check the last rows of train.txt
and valid.txt
and make sure there is an empty line in the end and the last sentences are complete (a sentence is marked by an empty line after)
from anago.
This error happens when your validation set contains tags that are not existent in your training set.
As this is a possible case in other kinds of machine learning problem, I build a workaround for it:
I defined a new Proprocessing class that includes tags from validation set into self.vocab_tag list.
class Preprocessor(WordPreprocessor):
def fit(self, x_train, y_train, y_valid):
super().fit(x_train, y_train)
entities = set()
for sent in y_valid:
entities.update(sent)
for t in entities:
if t not in self.vocab_tag:
self.vocab_tag[t] = len(self.vocab_tag)
return self
You also need a new wrapper class that is almost equivalent to Sequence, but uses your new preprocessor:
class AnagoWrapper(Sequence):
def train(self, x_train, y_train, x_valid=None, y_valid=None, vocab_init=None):
self.p = Preprocessor(vocab_init=vocab_init).fit(x_train, y_train, y_valid)
embeddings = filter_embeddings(self.embeddings, self.p.vocab_word, self.model_config.word_embedding_size)
self.model_config.vocab_size = len(self.p.vocab_word)
self.model_config.char_vocab_size = len(self.p.vocab_char)
self.model = SeqLabeling(self.model_config, embeddings, len(self.p.vocab_tag))
trainer = Trainer(self.model,
self.training_config,
checkpoint_path=self.log_dir,
preprocessor=self.p)
trainer.train(x_train, y_train, x_valid, y_valid)
Anyway, I am thinking about changing my preproccesor by taking a predefined list of tags into the self.vocab_tag list as this may error once you test your model and your test set contains tags that are not existens in training or validation set.
from anago.
When I tried with small dataset, this caused me such an error above though, if I fed the huge data, like the one you pushed on this repo, then it works.
so, did you set any limitations on data storage??
from anago.
Does anyone know about this??
from anago.
Probably, sentences
is an empty list:
>>> max([], key=len)
Traceback (most recent call last):
File "<input>", line 1, in <module>
ValueError: max() arg is an empty sequence
from anago.
Hi Hironsan and bode94
Thank you for your comment though, i know that... the thing is I didn't know why this caused me such an error. anyway, bode94 is right.
I didn't put the empty line at the end of the training data.
that's why if I use the distributed dataset, it works though, when it comes to mine, it did't work...
Thank you, both!
Hope you are doing well!
Best,
Rowing0914
from anago.
hmm, even though, I put a empty line at the end of the training data.
The issue was not solved..
I think i totally impersonate the original training data given in the directory.
Traceback (most recent call last):
File "test.py", line 9, in <module>
model.train(x_train, y_train, x_valid, y_valid)
File "/Users/norio.kosaka/anaconda3/envs/py36/lib/python3.6/site-packages/anago/wrapper.py", line 50, in train
trainer.train(x_train, y_train, x_valid, y_valid)
File "/Users/norio.kosaka/anaconda3/envs/py36/lib/python3.6/site-packages/anago/trainer.py", line 51, in train
callbacks=callbacks)
File "/Users/norio.kosaka/anaconda3/envs/py36/lib/python3.6/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "/Users/norio.kosaka/anaconda3/envs/py36/lib/python3.6/site-packages/keras/engine/training.py", line 2213, in fit_generator
callbacks.on_epoch_end(epoch, epoch_logs)
File "/Users/norio.kosaka/anaconda3/envs/py36/lib/python3.6/site-packages/keras/callbacks.py", line 76, in on_epoch_end
callback.on_epoch_end(epoch, logs)
File "/Users/norio.kosaka/anaconda3/envs/py36/lib/python3.6/site-packages/anago/metrics.py", line 124, in on_epoch_end
for i, (data, label) in enumerate(self.valid_batches):
File "/Users/norio.kosaka/anaconda3/envs/py36/lib/python3.6/site-packages/anago/reader.py", line 150, in data_generator
yield preprocessor.transform(X, y)
File "/Users/norio.kosaka/anaconda3/envs/py36/lib/python3.6/site-packages/anago/preprocess.py", line 115, in transform
y = [[self.vocab_tag[t] for t in sent] for sent in y]
File "/Users/norio.kosaka/anaconda3/envs/py36/lib/python3.6/site-packages/anago/preprocess.py", line 115, in <listcomp>
y = [[self.vocab_tag[t] for t in sent] for sent in y]
File "/Users/norio.kosaka/anaconda3/envs/py36/lib/python3.6/site-packages/anago/preprocess.py", line 115, in <listcomp>
y = [[self.vocab_tag[t] for t in sent] for sent in y]
KeyError: 'I-MISC'
import anago
from anago.reader import load_data_and_labels
x_train, y_train = load_data_and_labels('../data/conll2003/en/ner/train_1.txt')
x_valid, y_valid = load_data_and_labels('../data/conll2003/en/ner/valid_1.txt')
x_test, y_test = load_data_and_labels('../data/conll2003/en/ner/test_1.txt')
model = anago.Sequence()
model.train(x_train, y_train, x_valid, y_valid)
model.eval(x_test, y_test)
words = 'President Obama is speaking at the White House.'.split()
model.analyze(words)
train.txt
EU B-ORG
rejects O
German B-MISC
call O
to O
boycott O
British B-MISC
lamb O
. O
Peter B-PER
Blackburn I-PER
BRUSSELS B-LOC
1996-08-22 O
The O
European B-ORG
Commission I-ORG
said O
on O
Thursday O
it O
disagreed O
with O
German B-MISC
advice O
to O
consumers O
to O
shun O
British B-MISC
lamb O
until O
scientists O
determine O
whether O
mad O
cow O
disease O
can O
be O
transmitted O
to O
sheep O
. O
Germany B-LOC
's O
representative O
to O
the O
European B-ORG
Union I-ORG
's O
veterinary O
committee O
Werner B-PER
Zwingmann I-PER
said O
on O
Wednesday O
consumers O
should O
buy O
sheepmeat O
from O
countries O
other O
than O
Britain B-LOC
until O
the O
scientific O
advice O
was O
clearer O
. O
" O
We O
do O
n't O
support O
any O
such O
recommendation O
because O
we O
do O
n't O
see O
any O
grounds O
for O
it O
, O
" O
the O
Commission B-ORG
. O
so tell me the proper format for the dataset.
There is no description on it.
from anago.
Hi bode94
Thank you for your prompt action!
Oh,, yeah it's probably i just have created the datasets using head -n 100 train/test/valid.txt > train/test/valid_1.txt
So within the first 100 lines, maybe each text contains other parts...
Now I got it!!
Thank you so much for your contribution as well!
let me check!
from anago.
Related Issues (20)
- No module named 'anago' HOT 2
- Resume training from previous epoch
- Use the implemented model outside of the library and edit it
- Can not run example with pre-built models HOT 1
- Does not work with Keras==2.2.5 and tensorflow==1.14.0 HOT 3
- Doesn't save params.json HOT 2
- TypeError: add_weight() got multiple values for argument 'name' HOT 1
- always killed by OS HOT 2
- model.fit() got error "TypeError: add_weight() got multiple values for argument 'name'" HOT 1
- Additional inputs (even existing embedding layers) throwing shape errors
- Best way to add in 'gazzette-like' word-level features
- how to prepare custom dataset in BILOU format to train the model HOT 1
- Training never starts/finishes HOT 1
- how to save the model after training model.fit(x_train, y_train, epochs=15),and how can i get the weights_file and the params_file
- I always got OOM issue when running Evaluation
- I get an error saying no module named anago.data
- Deprecated joblib HOT 1
- Error while running the setup
- ### System information
- model.fit problem in google colab
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from anago.