Giter Club home page Giter Club logo

sltk's Issues

glove数据问题

即txt文件的首部需要有'词表大小 向量维度'信息。这个大家怎么解决的呀

运行 ./test.sh 时报 RuntimeError: value cannot be converted to type uint8_t without overflow: -1

Traceback (most recent call last):
File "../test.py", line 77, in
targets_list = sl_model.predict(sample_batched)
File "/xunku/SLTK-master/TorchNN/layers/bilstm_crf.py", line 106, in predict
path_score, best_paths = self.crf(lstm_feats, mask)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 357, in call
result = self.forward(*input, **kwargs)
File "/xunku/SLTK-master/TorchNN/layers/crf.py", line 182, in forward
path_score, best_path = self._viterbi_decode(feats, mask)
File "/xunku/SLTK-master/TorchNN/layers/crf.py", line 130, in _viterbi_decode
mask = 1 + (-1) * mask
RuntimeError: value cannot be converted to type uint8_t without overflow: -1

不知道这是怎么回事?

报错问题

您好,我用自己的文件测试时报错。文件只有两列 word tag 所以配置文件用的word.yml,数据原本是BIO标注,用您的工具转换成BIESO,非常感谢! torch 0.4.1

错误信息如下:
读取文件...
./data/output.txt: 619
./data/output.txt: 619
抽取预训练词向量...
特征word使用预训练词向量./data/resources/glove.6B.100d.txt:
C:\Users\ma\AppData\Local\Programs\Python\Python35\lib\site-packages\gensim\utils.py:1209: UserWarning: detected Windows; aliasing chunkize to chunkize_serial
warnings.warn("detected Windows; aliasing chunkize to chunkize_serial")
精确匹配: 2038 / 2715
模糊匹配: 356 / 2715
OOV: 321 / 2715
convert data to hdf5...
./data/output.txt.hdf5: 619
./data/output.txt.hdf5: 619
SLModel(
(word_feature_layer): WordFeature(
(feature_embedding_list): ModuleList(
(0): Embedding(2716, 100)
)
)
(char_feature_layer): CharFeature(
(char_embedding): Embedding(64, 30)
(char_encoders): ModuleList(
(0): Conv3d(1, 30, kernel_size=(1, 3, 30), stride=(1, 1, 1))
)
)
(dropout_feature): Dropout(p=0.5)
(rnn_layer): RNN(
(rnn): LSTM(130, 100, bidirectional=True)
)
(dropout_rnn): Dropout(p=0.5)
(crf_layer): CRF()
(hidden2tag): Linear(in_features=200, out_features=8, bias=True)
)
learning rate: 0.015
Epoch 1 / 1000: 557 / 557
Traceback (most recent call last):
File "G:/phd/8.8/SLTK-master/main.py", line 584, in
main()
File "G:/phd/8.8/SLTK-master/main.py", line 578, in main
train_model(configs)
File "G:/phd/8.8/SLTK-master/main.py", line 539, in train_model
model_trainer.fit()
File "G:\phd\8.8\SLTK-master\sltk\train\sequence_labeling_trainer.py", line 88, in fit
logits = self.model(**feed_tensor_dict)
File "C:\Users\ma\AppData\Local\Programs\Python\Python35\lib\site-packages\torch\nn\modules\module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "G:\phd\8.8\SLTK-master\sltk\nn\modules\sequence_labeling_model.py", line 130, in forward
word_feature = torch.cat([word_feature, char_feature], 2)
RuntimeError: invalid argument 0: Tensors must have same number of dimensions: got 3 and 2 at c:\new-builder_2\win-wheel\pytorch\aten\src\th\generic/THTensorMath.cpp:3607

不是很明白,谢谢您的指点。

什么是特征词汇表呢?

@liu-nlper 您好,我运行的时候提示没有这个文件:
FileNotFoundError: [Errno 2] No such file or directory: './data/alphabet/word.pkl'
请问什么是特征词汇表呢?是指自己外部找的词典吗?不知道词汇表需要什么样的形式,麻烦了

mask的数据类型要求是Byte类型

img_20190121_113059
mask的数据类型要求是Byte类型,但是我把LongTensor转换成ByteTensor依然报错,使用numpy转换也报错,请问一下作者,您当时的时候是怎么处理的吗,
期待您的回复 ,谢谢

再次请教:在中文标注任务中,使用预训练词向量的OOV数量很大,是否正常?

再次请教po主:我在做中文标注的任务训练,发现使用预训练词向量的匹配结果中,OOV的占比很大,是不是因为中文的词向量很多是分词之后的两字词、三字词四字词,而训练语料train.txt中的标记都是单字,所以导致OOV比较多啊?下面这种情况是否正常?是否可以继续训练呢?
抽取预训练词向量...
特征word使用预训练词向量./data/word2vec.txt:
精确匹配: 3365 / 7099
模糊匹配: 4 / 7099
OOV: 3730 / 7099
先谢过啦~~~

利用训练好模型做NER任务时,无hdf5文件问题

@liu-nlper 您好,我正在做NER任务,然后用训练好的模型找一份raw数据中实体时,会要求有相应名称的hdf5文件,但是当我把raw数据名称改成已有hdf5文件名时,效果极差,不过测试的数据是另外一个领域的啊,但是也存在相同实体。
不知道原因在于训练数据和最终要找的数据属于不同领域(有交叉实体),还是hdf5文件问题?望解答

crf的loss部分疑似进行了两次batch average

您好,我在参看代码的时候发现,crf.py 中的 neg_log_likelihood_loss 函数里有:
if self.average_batch:
return (forward_score - gold_score) / batch_size
return forward_score - gold_score
而在调用它的 sequence_labeling_model.py 中的 loss 函数里也有:
if not self.use_crf:
batch_size, max_len = feats.size(0), feats.size(1)
lstm_feats = feats.view(batch_size * max_len, -1)
tags = tags.view(-1)
return self.loss_function(lstm_feats, tags)
else:
loss_value = self.loss_function(feats, mask, tags)
print ('loss_value:', loss_value)
if self.average_batch:
batch_size = feats.size(0)
loss_value /= float(batch_size)
return loss_value
这样是不是就多求了一次平均呢?

关于glove词向量格式

“词向量下载地址: glove.6B.zip,词向量需修改为word2vec词向量格式,即txt文件的首部需要有'词表大小 向量维度'信息。”
请问po主,文件首部格式是怎样的? 我用的是100维的glove。不知道这里怎么改。。。
多谢!

有关代码模型准确率的问题

您好,您的代码是不是没有提供计算模型在dev、test数据集上F1 score的计算呢?是如何判定您的代码构建的模型准确度的?

训练稍微长点的句子就特别慢

用微博做训练语料。如果一句话的字的数量在130个,则这句话的训练时间需要几分钟。设置的batch size =1, 在CPU上训练的时候。这是正常现象吗?

RuntimeError: dimension specified as 0 but tensor has no dimensions

Hi, I run your code on pytorch 0.3. I found in your data set , I can run the code successfully. But in my own dataset or the data created by the head 200 lines in your dataset, the code run failed with the Error:
"RuntimeError: dimension specified as 0 but tensor has no dimensions"

大数据集显存不足的问题

您好!我想替换您的数据,用10MB左右的训练数据跑。但是每次运行都会显存不够。请问该如何解决呢?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.