vgtomahawk / charmanteau-camready Goto Github PK
View Code? Open in Web Editor NEWCode for "CharManteau: Character Embedding Models For Portmanteau Creation. EMNLP 2017. Varun Gangal*, Harsh Jhamtani*, Graham Neubig, Eduard Hovy, Eric Nyberg"
Code for "CharManteau: Character Embedding Models For Portmanteau Creation. EMNLP 2017. Varun Gangal*, Harsh Jhamtani*, Graham Neubig, Eduard Hovy, Eric Nyberg"
After downgrading some libraries and fixing some pathing errors, I'm able to run python barebones_enc_dec.py HOLDOUTTEST --dynet-mem 7000 --dynet-seed 786786
as described in README_CODE.txt
, however after constructing and training the network, it fails when attempting to save the model:
[dynet] random seed: 786786
[dynet] allocating memory: 7000MB
[dynet] memory allocation done.
Training fold 0
321
321
40
40
['/Users/Macbook/code/Charmanteau-CamReady/Code/language_model/', '/Users/Macbook/code/Charmanteau-CamReady/Code', '/usr/local/lib/python2.7/site-packages', '/Users/Macbook/code/Charmanteau-CamReady/Code', '/Users/Macbook/.pyenv/versions/2.7.10/lib/python27.zip', '/Users/Macbook/.pyenv/versions/2.7.10/lib/python2.7', '/Users/Macbook/.pyenv/versions/2.7.10/lib/python2.7/plat-darwin', '/Users/Macbook/.pyenv/versions/2.7.10/lib/python2.7/plat-mac', '/Users/Macbook/.pyenv/versions/2.7.10/lib/python2.7/plat-mac/lib-scriptpackages', '/Users/Macbook/.pyenv/versions/2.7.10/lib/python2.7/lib-tk', '/Users/Macbook/.pyenv/versions/2.7.10/lib/python2.7/lib-old', '/Users/Macbook/.pyenv/versions/2.7.10/lib/python2.7/lib-dynload', '/Users/Macbook/.pyenv/versions/2.7.10/lib/python2.7/site-packages', '/Users/Macbook/.pyenv/versions/2.7.10/lib/python2.7/site-packages/pgmpy-0.1.6-py2.7.egg', '/Users/Macbook/.pyenv/versions/2.7.10/lib/python2.7/site-packages/dyNET-0.0.0-py2.7-macosx-10.12-x86_64.egg']
MAX_SEQUENCE_LENGTH= 60
MAX_VOCAB_SIZE = 1500
embeddings_dim = 50
Using TensorFlow backend.
--- Loading CMU data
length of cmu data 133784
A couple of samples...
a
a(1)
a's
------------
Ignoring MAX_VOCAB_SIZE
Found vocab size = 48
Printing few sample sequences...
[1 3 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[ 1 3 13 27 19 26 15 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0]
[ 1 3 13 21 14 27 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0]
params['embeddings_dim'] = 50
lstm_cell_size= 100
/Users/Macbook/.pyenv/versions/2.7.10/lib/python2.7/site-packages/keras/layers/core.py:1206: UserWarning: `TimeDistributedDense` is deprecated, And will be removed on May 1st, 2017. Please use a `Dense` layer instead.
warnings.warn('`TimeDistributedDense` is deprecated, '
____________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
====================================================================================================
inp (InputLayer) (None, 59) 0
____________________________________________________________________________________________________
embedding_1 (Embedding) (None, 59, 50) 2550 inp[0][0]
____________________________________________________________________________________________________
lstm_1 (LSTM) (None, 59, 100) 60400 embedding_1[0][0]
____________________________________________________________________________________________________
timedistributeddense_1 (TimeDist (None, 59, 50) 5050 lstm_1[0][0]
====================================================================================================
Total params: 68,000
Trainable params: 68,000
Non-trainable params: 0
____________________________________________________________________________________________________
None
2018-10-11 22:11:23.911843: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2018-10-11 22:11:23.911864: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2018-10-11 22:11:23.911870: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2018-10-11 22:11:23.911876: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
Loaded cache
Initializing Blind Cache
Initialized Blind Cache
Size of Blind Cache: 1624
The dy.parameter(...) call is now DEPRECATED.
There is no longer need to explicitly add parameters to the computation graph.
Any used parameter will be added automatically.
Trainer::update_epoch has been deprecated and doesn't do anything. Please remove it from your code, and control the learning rate of the trainer directly, for example by: 'trainer.learning_rate /= (1 - rate_decay)', see https://github.com/clab/dynet/pull/695 for details.
Trainer::update_epoch has been deprecated and doesn't do anything. Please remove it from your code, and control the learning rate of the trainer directly, for example by: 'trainer.learning_rate /= (1 - rate_decay)', see https://github.com/clab/dynet/pull/695 for details.
Trainer::update_epoch has been deprecated and doesn't do anything. Please remove it from your code, and control the learning rate of the trainer directly, for example by: 'trainer.learning_rate /= (1 - rate_decay)', see https://github.com/clab/dynet/pull/695 for details.
Trainer::update_epoch has been deprecated and doesn't do anything. Please remove it from your code, and control the learning rate of the trainer directly, for example by: 'trainer.learning_rate /= (1 - rate_decay)', see https://github.com/clab/dynet/pull/695 for details.
Trainer::update_epoch has been deprecated and doesn't do anything. Please remove it from your code, and control the learning rate of the trainer directly, for example by: 'trainer.learning_rate /= (1 - rate_decay)', see https://github.com/clab/dynet/pull/695 for details.
Trainer::update_epoch has been deprecated and doesn't do anything. Please remove it from your code, and control the learning rate of the trainer directly, for example by: 'trainer.learning_rate /= (1 - rate_decay)', see https://github.com/clab/dynet/pull/695 for details.
Saving Model
Traceback (most recent call last):
File "barebones_enc_dec.py", line 1545, in <module>
predictor.train(interEpochPrinting=False)
File "barebones_enc_dec.py", line 1235, in train
self.save_model()
File "barebones_enc_dec.py", line 942, in save_model
self.model.save(self.modelFile,[self.encoder,self.revcoder,self.decoder,self.encoder_params["lookup"],self.decoder_params["lookup"],self.decoder_params["R"],self.decoder_params["bias"]])
File "_dynet.pyx", line 1448, in _dynet.ParameterCollection.save
File "_dynet.pyx", line 1505, in _dynet.ParameterCollection.write_to_textfile
AttributeError: 'list' object has no attribute 'encode'
Dumped cache
The presence of harmful slurs severely hampers the usefulness of the dataset—I am unable to easily use it as a teaching resource, because I don't want to expose students to words that might harm them. It's useless to me as a resource for training or evaluating models, since I don't want my models to perpetuate harmful language, nor do I want to positively evaluate my models on their ability to perpetuate such language. (There are slurs in the dataset that have been used in violent ways against me and people like me in particular, and I'm not eager to see those pop up in the output of my own experiments.)
The paper indicates that these portmanteau words were "manually" collected, which makes it seem like they were hand-picked—if that's the case, then it should not affect the integrity of the research if you simply choose to include some examples and leave others out.
(Note that I'm not saying that it's illegitimate to study words with offensive, violent, harmful content—but the dataset should be clearly labeled as containing such content. Moreover, in my opinion there should be well-reasoned, published criteria for deciding what is included in the dataset, so that other researchers can make judgments on whether the dataset is appropriate for their uses, or propose and compare different criteria for creating their own datasets.)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.