Giter Club home page Giter Club logo

crfsharp's People

Contributors

amironoff avatar dmit25 avatar vermorel avatar zhongkaifu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

crfsharp's Issues

problem with template file

hanks in advance for this project

i have problem i can not create template file for my Training corpus , my corpus include two col , Word and POS

DELM

اولين ADJ_SUP
سياره N_SING
خارج ADJ_SIM
از P
منظومه N_SING
شمسي ADJ_SIM
ديده ADJ_INO
شد V_PA
. DELM

please help me to solve problem how create template file for this corpus

Probability computation performance

Hi Zhongkai! Thanks for bringing this project to the community - really useful stuff!

When choosing to calculate prediction variants & their probabilities (vlevel=2), throughput drops around 2.5-3x. I profiled it and found that the bulk of time is spent in Forward-Backward algorithm implementation (calcAlpha/calcBeta methods).

Do you think there's a way to speed this up?

Cheers,
Andrey

Retrain with new tags?

Hi,

I am trying to do some transfer-learning framework with CRF. I would like to use the "retrain" feature but CRFSharp requires the pretrained model to have all tags in new learning samples. I could manually put all labels at the first learning, but once a new model is trained upon an existing model, it only contains the tags in the new samples.

Is there a way to keep tags in pretrained model at the target model?
Or even better, is there a way to provide possible tags at the encoding stage separately?

format Errors

err,20/03/2018 11:47:50  format error: U07:%x[-1,1]
err,20/03/2018 11:47:50  format error: U08:%x[0,1]
err,20/03/2018 11:47:50  format error: U09:%x[1,1]

example of data
2013. DATE

template

# Unigram
U01:%x[-1,0]
U02:%x[0,0]
U03:%x[1,0]
U07:%x[-1,1]
U08:%x[0,1]
U09:%x[1,1]

# Bigram
B

ive made sure my train data is tabbed

Error in decoding

Hello, kindly help in solving the following problem
i trained CRFsharp for POS task successfully but during decoding i got the below error,
i will be very thankful if you kindly let me know the root cause and its possible solution.
image of the error attached

"System.ArgumentOutOfRangeException' occurred in mscorlib.dll but was not handled in user code"
image

AdvUtils.DoubleArrayTrieSearch does not have Load Method

Hi,

I am trying to implement a CRF Decoder with this wonderful library in C#, but I got an error when I tried to load the model with DecoderWrapper.
var crfWrapper = new CRFSharpWrapper.Decoder();
crfWrapper.LoadModel(options.strModelFileName);

The error I got is
System.MissingMethodException: 'Method not found: 'Void AdvUtils.DoubleArrayTrieSearch.Load(System.String)'.'

I use your latest AdvUtils.dll but I still have the error. I installed your package from NuGET.
Could you help me?
Thank you very much

Build lexical dictionary failed

Hi, Zhongkai,
I have some trouble train my CRF model.
The error message was below:

Shrinking feature set [frequency is less than 2]...
Feature size in total : 5483512189
Saving model meta data...Success
Indexing feature set with 0.95 maximum slot usage rate threshold...
Building 5167189 features into index...
Invalidated value -2147483137 at index 2022959
Build lexical dictionary failed.
Failed!

Is this because of the memery limit?
I didn't found the information in the CRFCsharp homepage. Can you help me?

Thanks!

Some feedback about trainfile size and decode inputfile form

  1. Trainfile size is depended on your computer memory. More large memory can process more large corpus size.
    And, if your log show “Open and check training corpus and templates...“ means the corpus was still being loaded.
    If your trainfile is too large, even the memory would be out of usage.
  2. Decode inputfile, if you use Chinese corpus, or others language need to use word segmentation, please check the sentence size.
    If the sentence is too long, maybe cmd would be stopped.

HOPE these can help you
Thanks!

Format errors with any training data

I have tried countless training data files from ConLL shared tasks and anytime I try decoding a model, it outputs format errors. How to solve that? Any chance this is related to making those training data files tabbed? If so, how to tab them for successful output decoding?

feature_cache_size

this error happens at DecoderTagger.cs, line 405.
How can I deal with it? Is my data too long?
image

System.ArgumentOutOfRangeException in buidFeatures method from DecoderTagger class.

Hi, Zhongkai!

I have executed the encode mode succesfully, but running the decode I have the error System.ArgumentOutOfRangeException in buidFeatures method from DecoderTagger class.

        for (var cur = 0; cur < word_num; cur++)
        {
            feature_cache_row_size = 0;
            for (int index = 0; index < featureIndex.unigram_templs_.Count; index++)
            {
                var templ = featureIndex.unigram_templs_[index];
                var strFeature = featureIndex.apply_rule(templ, cur, this);
                if (strFeature == "")
                {
                    return Utils.ERROR_EMPTY_FEATURE;
                }
                id = featureIndex.get_id(strFeature);
                if (id != -1)
                {
                    feature_cache_[feature_cache_size][feature_cache_row_size] = id;
                    feature_cache_row_size++;
                }
            }
            feature_cache_[feature_cache_size][feature_cache_row_size] = -1;
            feature_cache_size++;
        }

In my case, word_num is 864 and feature_cache_ has only 200 items. feature_cache_size is increased in each loop, so when cur is 200 I have the OutOfRange exception.

Can you help me?
Thanks!

Question in decoding performance.

Hi there,

Thanks for sharing a great project. Not sure how much this repository is active, but I have a question.

I am running precompiled version of CRFSharp v1.2.0.0 on Ubuntu 16.04 with Mono. 5.2.0.224.

I love the performance increase in encoding (or learning), but I see that CPU resources are underexploited when I decode files. I run it with 30 cores and usage rate of most of the cores are below 10 % for half of the time and 30-50 % for the rest of the time. (In encoding stage, I see 100% usages all over the cores.) I would like to know if this is inherent in CRFSharp's algorithm or this is my configuration issue.

One of the observations I had was that there is not much difference in delays of decoding 10 samples to 1000 samples like from 4 mins for 5 sentences (each sentence consists of 150 tags), 5 mins for 100 sentences, and 10 mins for 1000 sentences. It looks like there is a large delay in loading the model. I currently use nbest=10, but I previously saw that there is not much difference in nbest=1 and nbest=10.

Let me know if there is anything that I can learn for this issue.

Thank you!

Limiting predicted tags using a dictionary

Hi @zhongkaifu,

Sometimes we know the possible tags to choose from for a single token. E.g. consider POS tagging: if we have a dictionary that maps terms to all possible tags, we could use this information to:

  1. Limit CRF# output
  2. Prune non-perspective paths and save some CPU time (we know that 'pizza' can't be a verb, so don't even check that path)

I'm looking at adding this feature into CRF#. My first approach just skips calcCost(node) in buildLattice, if node maps to a tag that isn't on the dictionary.

Wondering what your suggestions are to implement this the most pain-free way? :)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.