Giter Club home page Giter Club logo

jiesutd / ncrfpp Goto Github PK

View Code? Open in Web Editor NEW
1.9K 60.0 445.0 6.96 MB

NCRF++, a Neural Sequence Labeling Toolkit. Easy use to any sequence labeling tasks (e.g. NER, POS, Segmentation). It includes character LSTM/CNN, word LSTM/CNN and softmax/CRF components.

License: Apache License 2.0

Python 100.00%
pytorch ner sequence-labeling crf lstm-crf char-rnn char-cnn named-entity-recognition part-of-speech-tagger chunking neural-networks nbest lstm cnn natural-language-processing artificial-intelligence

ncrfpp's Introduction

NCRF++ Logo

NCRF++: An Open-source Neural Sequence Labeling Toolkit

Introduction

Sequence labeling models are quite popular in many NLP tasks, such as Named Entity Recognition (NER), part-of-speech (POS) tagging and word segmentation. State-of-the-art sequence labeling models mostly utilize the CRF structure with input word features. LSTM (or bidirectional LSTM) is a popular deep learning based feature extractor in sequence labeling task. And CNN can also be used due to faster computation. Besides, features within word are also useful to represent word, which can be captured by character LSTM or character CNN structure or human-defined neural features.

NCRF++ is a PyTorch based framework with flexiable choices of input features and output structures. The design of neural sequence labeling models with NCRF++ is fully configurable through a configuration file, which does not require any code work. NCRF++ can be regarded as a neural network version of CRF++, which is a famous statistical CRF framework.

This framework has been accepted by ACL 2018 as demonstration paper. And the detailed experiment report and analysis using NCRF++ has been accepted at COLING 2018 as the best paper.

NCRF++ supports different structure combinations of on three levels: character sequence representation, word sequence representation and inference layer.

  • Character sequence representation: character LSTM, character GRU, character CNN and handcrafted word features.
  • Word sequence representation: word LSTM, word GRU, word CNN.
  • Inference layer: Softmax, CRF.

Welcome to star this repository!

Requirement

Python: 2 or 3  
PyTorch: 1.0 

PyTorch 0.3 compatible version is here.

Advantages

  • Fully configurable: all the neural model structures can be set with a configuration file.
  • State-of-the-art system performance: models build on NCRF++ can give comparable or better results compared with state-of-the-art models.
  • Flexible with features: user can define their own features and pretrained feature embeddings.
  • Fast running speed: NCRF++ utilizes fully batched operations, making the system efficient with the help of GPU (>1000sent/s for training and >2000sents/s for decoding).
  • N best output: NCRF++ support nbest decoding (with their probabilities).

Usage

NCRF++ supports designing the neural network structure through a configuration file. The program can run in two status; training and decoding. (sample configuration and data have been included in this repository)

In training status: python main.py --config demo.train.config

In decoding status: python main.py --config demo.decode.config

The configuration file controls the network structure, I/O, training setting and hyperparameters.

Detail configurations and explanations are listed here.

NCRF++ is designed in three layers (shown below): character sequence layer; word sequence layer and inference layer. By using the configuration file, most of the state-of-the-art models can be easily replicated without coding. On the other hand, users can extend each layer by designing their own modules (for example, they may want to design their own neural structures other than CNN/LSTM/GRU). Our layer-wised design makes the module extension convenient, the instruction of module extension can be found here.

alt text

Data Format

  • You can refer the data format in sample_data.
  • NCRF++ supports both BIO and BIOES(BMES) tag scheme.
  • Notice that IOB format (different from BIO) is currently not supported, because this tag scheme is old and works worse than other schemes Reimers and Gurevych, 2017.
  • The difference among these three tag schemes is explained in this paper.
  • I have written a script which converts the tag scheme among IOB/BIO/BIOES. Welcome to have a try.

Performance

Results on CONLL 2003 English NER task are better or comparable with SOTA results with the same structures.

CharLSTM+WordLSTM+CRF: 91.20 vs 90.94 of Lample .etc, NAACL16;

CharCNN+WordLSTM+CRF: 91.35 vs 91.21 of Ma .etc, ACL16.

By default, LSTM is bidirectional LSTM.

ID Model Nochar CharLSTM CharCNN
1 WordLSTM 88.57 90.84 90.73
2 WordLSTM+CRF 89.45 91.20 91.35
3 WordCNN 88.56 90.46 90.30
4 WordCNN+CRF 88.90 90.70 90.43

We have compared twelve neural sequence labeling models ({charLSTM, charCNN, None} x {wordLSTM, wordCNN} x {softmax, CRF}) on three benchmarks (POS, Chunking, NER) under statistical experiments, detail results and comparisons can be found in our COLING 2018 paper Design Challenges and Misconceptions in Neural Sequence Labeling.

Add Handcrafted Features

NCRF++ has integrated several SOTA neural characrter sequence feature extractors: CNN (Ma .etc, ACL16), LSTM (Lample .etc, NAACL16) and GRU (Yang .etc, ICLR17). In addition, handcrafted features have been proven important in sequence labeling tasks. NCRF++ allows users designing their own features such as Capitalization, POS tag or any other features (grey circles in above figure). Users can configure the self-defined features through configuration file (feature embedding size, pretrained feature embeddings .etc). The sample input data format is given at train.cappos.bmes, which includes two human-defined features [POS] and [Cap]. ([POS] and [Cap] are two examples, you can give your feature any name you want, just follow the format [xx] and configure the feature with the same name in configuration file.) User can configure each feature in configuration file by using

feature=[POS] emb_size=20 emb_dir=%your_pretrained_POS_embedding
feature=[Cap] emb_size=20 emb_dir=%your_pretrained_Cap_embedding

Feature without pretrained embedding will be randomly initialized.

Speed

NCRF++ is implemented using fully batched calculation, making it quite effcient on both model training and decoding. With the help of GPU (Nvidia GTX 1080) and large batch size, LSTMCRF model built with NCRF++ can reach 1000 sents/s and 2000sents/s on training and decoding status, respectively.

alt text

N best Decoding

Traditional CRF structure decodes only one label sequence with largest probabolities (i.e. 1-best output). While NCRF++ can give a large choice, it can decode n label sequences with the top n probabilities (i.e. n-best output). The nbest decodeing has been supported by several popular statistical CRF framework. However to the best of our knowledge, NCRF++ is the only and the first toolkit which support nbest decoding in neural CRF models.

In our implementation, when the nbest=10, CharCNN+WordLSTM+CRF model built in NCRF++ can give 97.47% oracle F1-value (F1 = 91.35% when nbest=1) on CoNLL 2003 NER task.

alt text

Reproduce Paper Results and Hyperparameter Tuning

To reproduce the results in our COLING 2018 paper, you only need to set the iteration=1 as iteration=100 in configuration file demo.train.config and configure your file directory in this configuration file. The default configuration file describes the Char CNN + Word LSTM + CRF model, you can build your own model by modifing the configuration accordingly. The parameters in this demo configuration file are the same in our paper. (Notice the Word CNN related models need slightly different parameters, details can be found in our COLING paper.)

If you want to use this framework in new tasks or datasets, here are some tuning tips by @Victor0118.

Report Issue or Problem

If you want to report an issue or ask a problem, please attach the following materials if necessary. With these information, I can give fast and accurate discussion and suggestion.

  • log file
  • config file
  • sample data

Cite

If you use NCRF++ in your paper, please cite our ACL demo paper:

@inproceedings{yang2018ncrf,  
 title={NCRF++: An Open-source Neural Sequence Labeling Toolkit},  
 author={Yang, Jie and Zhang, Yue},  
 booktitle={Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics},
 Url = {http://aclweb.org/anthology/P18-4013},
 year={2018}  
}

If you use experiments results and analysis of NCRF++, please cite our COLING paper:

@inproceedings{yang2018design,  
 title={Design Challenges and Misconceptions in Neural Sequence Labeling},  
 author={Yang, Jie and Liang, Shuailong and Zhang, Yue},  
 booktitle={Proceedings of the 27th International Conference on Computational Linguistics (COLING)},
 Url = {http://aclweb.org/anthology/C18-1327},
 year={2018}  
}

Future Plan

  • Document classification (working)
  • Support API usage
  • Upload trained model on Word Segmentation/POS tagging/NER
  • Enable loading pretrained ELMo parameters
  • Add BERT feature extraction layer

Update

  • 2018-Dec-17, NCRF++ v0.2, support PyTorch 1.0
  • 2018-Mar-30, NCRF++ v0.1, initial version
  • 2018-Jan-06, add result comparison.
  • 2018-Jan-02, support character feature selection.
  • 2017-Dec-06, init version

ncrfpp's People

Contributors

abbottlane-zz avatar frostming avatar jiesutd avatar ljch2018 avatar tagucci avatar twhughes avatar victor0118 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ncrfpp's Issues

Problem of using glove 100 on Windows

If you using glove 100 on Windows, it will probably have some errors about gbk encoding.
So the solution is changing the code in functions.py, line 128:
with open(embedding_path, 'r') as file:
add encoding = "utf-8" like the following:
with open(embedding_path, 'r',encoding="utf-8") as file:

Some problems with the dataset.

Excuse me, I can't find the conll data with BIOES tags.Where can I get the data to perform your score?
The score of BIO tags is something worse than BIOES ' s.

Can you give me a url to download the data?

thanks a lot for your help.

Deterministic Training Behaviour

Hi,
what am I doing wrong that the training with the same hyperparameters results always in the same performance? There should be some variance because of the random initialization or am i wrong?

Config looks like this:

train_dir=.../conll2003/en/ner/train.txt
dev_dir=.../conll2003/en/ner/valid.txt
test_dir=.../conll2003/en/ner/test.txt
model_dir=test.model
word_emb_dir=.../sample_data/sample.word.emb
norm_word_emb=False
norm_char_emb=False
number_normalized=True
seg=True
word_emb_dim=50
char_emb_dim=30
use_crf=True
use_char=True
word_seq_feature=LSTM
char_seq_feature=CNN
nbest=1
status=train
optimizer=Adam
iteration=1
batch_size=8
ave_batch_loss=False
cnn_layer=4
char_hidden_dim=30
hidden_dim=200
dropout=0.5
lstm_layer=2
bilstm=True
learning_rate=0.002
lr_decay=0.05
momentum=0
l2=1e-08
gpu=True

Question about alphabet

Hello,I have a problem about the alphabet.
In the file alphabet.py ,the function size returns len(self.instances) + 1,I think it is cause of the padding /pad,but in the file seqmodel.py,why we have to add two more labels for down stream lstm?Though we use the original label size for CRF,actually in the file CFR model,in the transition matrix ,still add "start" and "end".this confused me.
And if I do not use CRF,_, tag_seq = torch.max(outs, 1)maybe lead to the wrong index.
Thank you~

CRF PZ calculation

In log sum exp why take argmax and then gather instead of just taking max ? any gradient flow issues ?

f1 score is -1, pred_num = 0

The same issue as 22#. We use our dataset to train the NER model. The tag scheme is BIOES (The only difference is we used "M-" instead of "I-"). These data have been test on your "Lattice LSTM model". They can get accurate p,r,f1 value. So I am confused about this. Why our f1 score is -1 and pred_num = 0 on this model?

word embeddings

which word embeddings did you use to get the results as displayed?

Question about the F1 score in Section 2.

Hello. I want to know if the performance on CONLL 2003 English NER reported in Section 2 is the average performance or the maximum performance? Did you done the significant test? I think 91.20/91.26 is good, but the result of the significant test is needed for CONLL 2003 English NER.

Predicted output sample count not equals to input count

I found the sample number of decode output is less than decode input.
Do you just skip the sentence with no tag predicted?
Can I have all predicted result?

build word sequence feature extractor: LSTM...
build word representation...
build CRF...
Decode raw data, nbest: 1 ...
gold_num =  48612  pred_num =  44576  right_num =  32770
raw: time:150.02s, speed:149.76st/s; acc: 0.9298, p: 0.7351, r: 0.6741, f: 0.7033

My input count is 1514142, and output count is 1510350.

Design of character level feature extractor

Hi,
In the cnn feature extractor, is it the case that in a batch you are assuming all the words to be of the same length? If so, then there must have been padding to the smaller words, will it not disturb the char level features?

Optimize with sgd

Hi,
I am using ncrfpp on my own dataset.
Adam can converge normally in fewer than 20 epochs.

However, optimizing with SGD is extremely hard. I got gradient explosion or non-convergence most of the time.
Removing dropouts and l2 regularization and using very small lr makes the training converge, but extremely slow.

Could you share your parameters used for training with SGD?
Many thanks!

about char pretrained embedding

Thank you for this excellent open source code.
But I have one question about the pre-trained embedding for charaters,In the class "Data",we load the pre-trained embedding for characters,but i donot known where to use it,maybe I have to add one parameter called "pretrained_char_embedding",and pass it into the class CharBilstm(for example),and modify the code like below:
if pretrain_char_embedding is not None: self.char_embeddings.weight.data.copy_(torch.from_numpy(pretrain_char_embedding)) else: self.char_embeddings.weight.data.copy_( torch.from_numpy(self.random_embedding(alphabet_size, embedding_dim)))

f score is -1

In the file demo.train.config I changed the iterations to 100 and batch_size to 32, dev and test scores are almost always -1. (Note this is on the sample_dataset that you have provided with the embeddings that you have provided)

Problem

hello,
In this experiment, if the development set is too small, will it have a big impact on the experimental results?thanks

How to train model in more Epochs?

Hi, I run the demo code successfully. It worked pretty well.
And thus I wish to use it in my application. But I encounter a problem here.
Even if I set the iteration=30, the training process will stop after the first Epoch.
What should I do in this case?

RuntimeError: invalid argument 0: Tensors must have same number of dimensions: got 3 and 2

Hi,Thank you for this perfect code,and I have learned a lot from this code.But when I run this code,I get one problem :
Traceback (most recent call last): File "main.py", line 438, in <module> train(data, save_model_dir, seg) File "main.py", line 265, in train batch_charrecover, batch_label, mask) File "/Users/fengxiachong/Desktop/PyTorchSeqLabel-master/model/bilstmcrf.py", line 33, in neg_log_likelihood_loss scores, tag_seq = self.crf._viterbi_decode(outs, mask) File "/Users/fengxiachong/Desktop/PyTorchSeqLabel-master/model/crf.py", line 162, in _viterbi_decode partition_history = torch.cat(partition_history).view(seq_len, batch_size, -1).transpose(1, RuntimeError: invalid argument 0: Tensors must have same number of dimensions: got 3 and 2 at /Users/soumith/minicondabuild3/conda-bld/pytorch_1518371252923/work/torch/lib/TH/generic/THTensorMath.c:2888
could you please help me?

Bug in IOBES converter?

It appears from your sample that there may be a bug in your IOBES converter, which Im assuming will affect your paper findings slightly?

An example would be at line 6845 of train.bmes.txt:

English S-MISC
County S-MISC
Championship S-MISC
cricket O
matches O
on O
Thursday O
: O

The original file in IOB1 has this:

English NNP I-NP I-MISC
County NNP I-NP B-MISC
Championship NNP I-NP I-MISC
cricket NN I-NP O
matches NNS I-NP O
on IN I-PP O
Thursday NNP I-NP O
: : O O

problem in config reader

why is tagScheme is initalised with noSeg?

and it is not getting updated upon reading the config file, so far in my experiments tagscheme always defaults to BIO and not BMES how do you set the tagScheme? through config file (I don't think you've even written code for that)

Probability of an Output Sequence

Hi,
I want to get probability of each output sequence ( not n-best score ) when decoding.
How to get this?
(How to get partition function Z when decoding?)

didn't match because some of the arguments have invalid types: (list)

Hi I'm trying to run the demo traning:
python main.py --config demo.train.config, then got

 ++++++++++++++++++++++++++++++++++++++++
 Hyperparameters:
     Hyper              lr: 0.015
     Hyper        lr_decay: 0.05
     Hyper         HP_clip: None
     Hyper        momentum: 0.0
     Hyper              l2: 1e-08
     Hyper      hidden_dim: 200
     Hyper         dropout: 0.5
     Hyper      lstm_layer: 1
     Hyper          bilstm: True
     Hyper             GPU: True
DATA SUMMARY END.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
build network...
use_char:  True
char feature extractor:  CNN
word feature extractor:  LSTM
use crf:  True
build word sequence feature extractor: LSTM...
build word representation...
build char sequence feature extractor: CNN ...
build CRF...
Epoch: 0/1
 Learning rate is setted as: 0.015
Traceback (most recent call last):
  File "main.py", line 436, in <module>
    train(data)
  File "main.py", line 326, in train
    batch_word, batch_features, batch_wordlen, batch_wordrecover, batch_char, batch_charlen, batch_charrecover, batch_label, mask  = batchify_with_label(instance, data.HP_gpu)
  File "main.py", line 234, in batchify_with_label
    mask[idx, :seqlen] = torch.Tensor([1]*seqlen)
TypeError: mul() received an invalid combination of arguments - got (list), but expected one of:
 * (Tensor other)
      didn't match because some of the arguments have invalid types: (list)
 * (float other)
      didn't match because some of the arguments have invalid types: (list)

Any suggestion would be appreciated. Thanks!

Data format

Hi, for NER what's the format of the file that the script expects:

AL-AIN NNP I-NP I-LOC
, , O O
United NNP I-NP I-LOC
Arab NNP I-NP I-LOC
Emirates NNPS I-NP I-LOC
1996-12-06 CD I-NP O
....

like the above? I saw that you need some .bme files.

ask for advice

hello,
Can I change raw.bmes to test.bmes during decoding? And,
I want to know the role of raw.bmes.

Can you help me ?

Thanks for share this good work, can you help me?
I set the iteration to be 15000 times and find that the training process is too slow.And the F1 score always be 0.7 percent . So can you help me with the training problems?

BTW, I find that your f1 depends on p and recall. But I found that others's work calculates it with acc and recall. May be I am wrong. So how can I get your score.And after how many epoch did you get the best score, so that I can set the iteration param.

I will appreciate for you help!

problems in reproducibility

I have tried running the configuration mentioned in readme on GPU, with 10 different seeds.
I am still not able to hit f score of 90+ (for non lstm based results) or 91+ for lstm based result.

Documentation for main_parse

I'm trying to get a pre-trained model to run via the command line using main_parse but am having issues.

It would be helpful to have some documentation on the command line arguments for this function.

Thanks

句子长度超过256后发生的奇妙bug

# model/crf.py
#  def _viterbi_decode(self, feats, mask):
length_mask = torch.sum(mask, dim = 1).view(batch_size,1).long()
mask == [1, 1, ...,1]  # szie: 1 * 256
torch.sum(mask, dim=1)   # 当mask是ByteTensor时,结果是0

血崩 这个bug拖了我3天

Data information&other tasks' performances

There are two questions I want to ask you:
1, The numbers of sentences of data used in your code are 14987, 3466, 3684 or 14041, 3250, 3453 (train, dev, test respectively). Can you tell me?
2, Whether your code can obtain comparable performances in CoNLL03 Germany NER data as Lample .etc, NAACL16 and in WSJ data as Ma .etc, ACL16. That is, how does your code perform in other tasks compared to existing related works? Did you have a try?

a problem

`Traceback (most recent call last):
File "main.py", line 436, in
train(data)
File "main.py", line 326, in train
batch_word, batch_features, batch_wordlen, batch_wordrecover, batch_char, batch_charlen, batch_charrecover, batch_label, mask = batchify_with_label(instance, data.HP_gpu)
File "main.py", line 234, in batchify_with_label
mask[idx, :seqlen] = torch.Tensor([1]*seqlen)
TypeError: mul() received an invalid combination of arguments - got (list), but expected one of:

  • (Tensor other)
    didn't match because some of the arguments have invalid types: (list)
  • (float other)
    didn't match because some of the arguments have invalid types: (list)`

Python 3+

By any chance, do you know if this is compatible with Python3+? I noticed you mentioned 2.7 in your requirements.

Unable to replicate the reported numbers on CoNLL dataset

After 100 epochs on the train-dev-test splits of CoNLL 2003, dataset with LSTM and CNN character features, I get the following results for the best dev f-score:

  • LSTM
    Dev: time: 5.59s, speed: 627.24st/s; acc: 0.9891, p: 0.9460, r: 0.9465, f: 0.9463
    Exceed previous best f score: 0.945258548088
    Test: time: 5.57s, speed: 712.46st/s; acc: 0.9808, p: 0.9102, r: 0.9107, f: 0.9104
  • CNN
    Dev: time: 5.32s, speed: 660.84st/s; acc: 0.9891, p: 0.9458, r: 0.9460, f: 0.9459
    Exceed previous best f score: 0.945809491754
    Test: time: 4.88s, speed: 788.98st/s; acc: 0.9804, p: 0.9081, r: 0.9068, f: 0.9074

I'm trying to understand what it takes to reproduce the reported numbers, and also use as a baseline for my experiments. Let me know what are the other parameters that I need to change.

Also, thanks for open-sourcing the code!

doubt in metric.py

I think get_ner_BIO() in metric.py is wrong.

consider the example where label_list = [I-MISC, I-MISC, O, I-PER, I-PER, O, O, O, O, O I-ORG, O] according to current function the following will happen :

Since there is no tag involving B-, whole_tag and tag_index will always be [] and hence the output of the function is [] which is wrong?

Python 3 Support

Thanks for the nice work. A minor issue, you have implemented support for Python 3 by catching the ModuleNotFoundError exception, which is fine for Python version 3.6 but will cause an error in versions <=3.5.

A quick solution would be to use ImportError instead of ModuleNotFoundError, at lines 24 and 14 in main.py and utils/data.py, respectively.

Can I use cpu running this program?

Hi @jiesutd jiesutd,

Thanks for sharing your work, it is an incredible job!
My GPU is too old to run this program, I was wondering that if I can use CPU to run this program?
I set the parameter "#gpu" to "gpu = 0" in "demo.train.config" file, and it does not work.

Regards,
Thanks!

Integration

Hello, I want to integrate your code with my system "for academic purposes"
So I want to enable the system to take input a stream of tokens and output their respective POS tags
how can I do that ?

thanks

RuntimeError in crf.py line 247

Hi,

I am trying to run the training demo using python 3.5 and torch 0.40 (with cuda on an NVIDIA 1050 GTX). I get the following error:
THCudaCheck FAIL file=/pytorch/aten/src/THC/generic/THCTensorCopy.c line=70 error=59 : device-side assert triggered /pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = long, Dims = 2]: block: [0,0,0], thread: [0,0,0] AssertionindexValue >= 0 && indexValue < src.sizes[dim]failed. /pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = long, Dims = 2]: block: [0,0,0], thread: [1,0,0] AssertionindexValue >= 0 && indexValue < src.sizes[dim]failed. /pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = long, Dims = 2]: block: [0,0,0], thread: [2,0,0] AssertionindexValue >= 0 && indexValue < src.sizes[dim]failed. /pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = long, Dims = 2]: block: [0,0,0], thread: [3,0,0] AssertionindexValue >= 0 && indexValue < src.sizes[dim]failed. /pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = long, Dims = 2]: block: [0,0,0], thread: [4,0,0] AssertionindexValue >= 0 && indexValue < src.sizes[dim]failed. /pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = long, Dims = 2]: block: [0,0,0], thread: [5,0,0] AssertionindexValue >= 0 && indexValue < src.sizes[dim] failed. Traceback (most recent call last): File "/home/spike/Software/PyCharm/helpers/pydev/pydevd.py", line 1668, in <module> main() File "/home/spike/Software/PyCharm/helpers/pydev/pydevd.py", line 1662, in main globals = debugger.run(setup['file'], None, None, is_module) File "/home/spike/Software/PyCharm/helpers/pydev/pydevd.py", line 1072, in run pydev_imports.execfile(file, globals, locals) # execute the script File "/home/spike/Software/PyCharm/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile exec(compile(contents+"\n", file, 'exec'), glob, loc) File "/home/spike/Projects/NCRFpp/main.py", line 434, in <module> train(data) File "/home/spike/Projects/NCRFpp/main.py", line 326, in train loss, tag_seq = model.neg_log_likelihood_loss(batch_word,batch_features, batch_wordlen, batch_char, batch_charlen, batch_charrecover, batch_label, mask) File "/home/spike/Projects/NCRFpp/model/seqmodel.py", line 43, in neg_log_likelihood_loss total_loss = self.crf.neg_log_likelihood_loss(outs, mask, batch_label) File "/home/spike/Projects/NCRFpp/model/crf.py", line 262, in neg_log_likelihood_loss gold_score = self._score_sentence(scores, mask, tags) File "/home/spike/Projects/NCRFpp/model/crf.py", line 247, in _score_sentence tg_energy = tg_energy.masked_select(mask.transpose(1,0)) RuntimeError: cuda runtime error (59) : device-side assert triggered at /pytorch/aten/src/THC/generated/../THCReduceAll.cuh:339

Does anyone know what might be causing this error?

Mask without CRF

In case when crf is false, you do not use mask in calculation of loss. Is there any reason for that?

Unfrozen word vector

Hi,
Can you please give some guidelines as to how can I unfreeze the word vectors?

I have the embedding file, word gets converted to embedding, but I want this embedding to get tweaked while training ('hence unfreezing'). Can you help?

Thanks

关于 CNN_BILSTM_CRF model的一些问题

先膜拜大佬:
我想把这个模型用在一个中文的序列标注问题上:这里面有POS的标记:这个和CNN_character的特征冲突吗,你的项目里面是手动标记特征和CNN_character的特征可以共存吗?另外看了一下数据的预处理的格式:Friday [Cap]1 [POS]NNP O ,我只用到了POS的特征数据是不是应该写成Friday [POS]NNP O ,[POS]是必须要的吗,还是你只是作为一个标记?
烦请指教

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.