aonotas / adversarial_text Goto Github PK
View Code? Open in Web Editor NEWCode for Adversarial Training Methods for Semi-Supervised Text Classification
Code for Adversarial Training Methods for Semi-Supervised Text Classification
hello, I have some questions about dropout in VAT.
If I use dropout in VAT, the output distribution will change even without perturbation.
thanks!
Hi, when i try to run download.sh, i have the following error:
Prepare for IMDB
Prepare script is running...
Traceback (most recent call last):
File "preprocess.py", line 79, in <module>
prepare_imdb()
File "preprocess.py", line 55, in prepare_imdb
imdb_validation_pos_start_id)
File "preprocess.py", line 24, in load_file
words = read_text(filename.strip())
File "preprocess.py", line 11, in read_text
for line in f:
File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 399: ordinal not in range(128)
Then i added encoding='utf-8'
at every with open()
in preprocessing.py
After that, i have the following error:
Namespace(adaptive_softmax=1, add_labeld_to_unlabel=1, alpha=0.001, alpha_decay=0.9998, batchsize=32, batchsize_semi=96, clip=5.0, dataset='imdb', debug_mode=0, dropout=0.5, emb_dim=256, eval=0, freeze_word_emb=0, gpu=0, hidden_cls_dim=30, hidden_dim=1024, ignore_unk=1, load_trained_lstm='', lower=0, min_count=1, n_class=2, n_epoch=30, n_layers=1, nl_factor=1.0, norm_sentence_level=1, pretrained_model='imdb_pretrained_lm.model', random_seed=1234, save_name='imdb_model_vat', use_adv=0, use_exp_decay=1, use_rational=0, use_semi_data=1, use_unlabled_to_vocab=1, word_only=0, xi_var=5.0, xi_var_first=1.0)
train_set:71246
avg word number:242.8615501221121
vocab:87008
avg word number (train_x): 242.43914148545608
avg word number (dev_x):239.861747469366
avg word number (test_x):235.59372
lm_words_num:17297560
train_vocab_size: 66825
vocab_inv: 87008
Traceback (most recent call last):
File "train.py", line 354, in <module>
main()
File "train.py", line 164, in main
serializers.load_npz(args.pretrained_model, pretrain_model)
File "/usr/local/lib/python3.6/dist-packages/chainer/serializers/npz.py", line 190, in load_npz
d.load(obj)
File "/usr/local/lib/python3.6/dist-packages/chainer/serializer.py", line 83, in load
obj.serialize(self)
File "/usr/local/lib/python3.6/dist-packages/chainer/link.py", line 997, in serialize
d[name].serialize(serializer[name])
File "/usr/local/lib/python3.6/dist-packages/chainer/link.py", line 651, in serialize
data = serializer(name, param.data)
File "/usr/local/lib/python3.6/dist-packages/chainer/serializers/npz.py", line 150, in __call__
numpy.copyto(value, dataset)
ValueError: could not broadcast input array from shape (86935,256) into shape (87008,256)
I guess it is my modifying the decoding method that throws out some lines in file?
Could you give me a workout on this issue?
Hi, thanks for your work. This is a question, not an issue, so feel free to close it if you want.
I have a dataset with not-labelled call transcriptions, and I want to train a classifier for them. I'm wondering if I could use the adversarial training to train it (once part of the dataset it's labelled manually).
I'm waiting your suggestion, thanks again!
When i am running virtual adversarial training, i got the error at nets.py file line,
x_data = self.xp.concatenate(x_data, axis=0)
and also with the train.py file
output_original = model(x, length)
I got 404 Not Found when tried to download the pretrain model from:
http://sato-motoki.com/research/vat/imdb_pretrained_lm.model
This is my output:
wget http://sato-motoki.com/research/vat/imdb_pretrained_lm.model
--2020-09-01 16:52:54-- http://sato-motoki.com/research/vat/imdb_pretrained_lm.model
Resolving sato-motoki.com (sato-motoki.com)... 185.199.109.153, 185.199.111.153, 185.199.108.153, ...
Connecting to sato-motoki.com (sato-motoki.com)|185.199.109.153|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2020-09-01 16:52:58 ERROR 404: Not Found.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.