naturalnode / natural Goto Github PK

View Code? Open in Web Editor NEW

10.5K 10.5K 861.0 14.6 MB

general natural language facilities for node

License: MIT License

JavaScript 66.47% PEG.js 0.11% TypeScript 33.42%

natural's People

Contributors

Stargazers

Watchers

Forkers

bjzu cesutherland kr1sp1n tj buley cartercole thegoleffect mbarnett wez adamb0mb mullr ajtulloch joe8bit localsense agibsonccc bstahlhood revskill10 vibster karanparikh won21kr zdwalter chrisdew msuess sidred123 kevnz obazoud seejohnrun bru suissa mt3 amitamb gijs mde crodas vladimir-polyakov moos ericjohn manvesh jonpacker node-migrator-bot netcoid fardinak gmarty sintaxi jkimbo joscha dashersw maslennikov supernayan ericjustusson timgestson piercus w1nk ubraz dav009 btagliani balderdashy changguanghua niangaotuantuan oliverroick smituk phlipper genialeo jeanfw egold navikohli alepharchives iftahh gotomypc haot3x apfrancis sysrun kayoh hhamalai machellerogden bengourley jdolitsky mango-information-systems kvnlnt harlantwood jmagnarelli faisalabid serendipious jdvorak vunb geniusio peterpay saiaman erelsgl-nlp happyspace sheltowt liluxdev nyxtom fenta23 tiagom101 wiseradar netconstructor big-data lcos77 onzu

natural's Issues

International Support

Any plans for international support, i guess i am trying to use the tokenizer to parse Arabic words

classify with bayes doesn't output score

I tried example with bayes and the classify method only returns the "best match" but not the whole scores list. Is this an error or have I the bad version ? => 0.1.20

Jaro-Winkler Infinite Loop

Running...

natural.JaroWinklerDistance('aaa', 'abcd');

...which gives matches arrays of...

matches1= [ true, true, ]
matches2= [ true, , , ]

...causes an infinite while loop in the "count transpositions" section.

I don't have a solid enough understanding of the algorithm to know if this is a simple boundary check error or a sign of a more fundamental problem.

Options for Levensthein

There is no way to set one of insertion_cost/deletion_cost/substitution_cost to zero value.

Could you fix it please? :)

Naive Bayes getClassifications wrong sorting

I trained the classifier with over 3000 documents, when I asked getClassifications I noticed values where not sorted so the first value was not the highest probability. I changed the sign to sort from lowest probability to largest and then retuned the last value. That did the trick. Here is the code:

function getClassifications(observation) {
var classifier = this;
var labels = [];

for(var className in this.classFeatures) {
labels.push({label: className,
      value: classifier.probabilityOfClass(observation, className)});
}

return labels.sort(function(x, y) {return y.value - x.value});

}

and instead of calling Classify I did the following:

result = classifier.getClassifications(test[i]);
response.push([test[i], result[result.length - 1].label])

the "re" in contractions like "we're" should not generate a TFIDF term entry for "re"

I'm using TFIDF to process text. I noticed that words like "we're" and "you're" are generating term entries for the word "re" -- the contracted form of are. Since "are" is a noise word already, I think this should be a noise word too, right, and so added to stopwords.js?

Thanks.

offsets on wordnet

hey, wordnet integration is awesome! that dataset is really hard to work with. awesome stuff dude!
anyways, found a bug on lookupSynonyms()

fs.js:248
 binding.read(fd, buffer, offset, length, position, wrapper);
      ^
Error: Offset is out of bounds

heres the code:

var natural = require('natural');
var wordnet = new natural.WordNet('.');

  wordnet.lookupSynonyms('hot', function(results) {
    results.forEach(function(result) {
      console.log(result.lemma+'  -  '+result.pos);
  }); 
});

cheers man, i'm working on a fork of that jspos tagger that i can integrate if you'd like

Parts of Speech Tagger

Has the parts of speech tagger been ported from nltools?

Error when using BayesClassifier.load

Hey,

When trying to do this:

natural.BayesClassifier.load('classifier.json', null, function(err, classifier) {
console.log(classifier.classify("this is a test"));
});

(copied very closely from the examples -- same result with or without the null argument)

I get this error:

/Users/username/Sites/natural/node_modules/apparatus/lib/apparatus/classifier/bayes_classifier.js:95
classifier.proto = BayesClassifier.prototype;
^
TypeError: Cannot set property 'proto' of undefined
at Function.restore (/Users/username/Sites/natural/node_modules/apparatus/lib/apparatus/classifier/bayes_classifier.js:95:27)
at restore (/Users/username/Sites/natural/lib/natural/classifiers/bayes_classifier.js:37:54)
at /Users/username/Sites/natural/lib/natural/classifiers/bayes_classifier.js:44:23
at /Users/username/Sites/natural/lib/natural/classifiers/classifier.js:104:13

I cloned the repo locally so I could try to debug it, but I haven't had any luck figuring out why it thinks classifier is undefined there. If I log out what classifier is before the restore function is called (in the load function of classifier.js), it seems to have the right object. However, in the restore function, is does think classifier is undefined all the way through.

This did work before the switch to apparatus. Any help would be greatly appreciated!

Global leak (variable `term`) in TfIdf.listTerms

Using version 0.1.19 of natural, mocha detects a global "leak" whenever TfIdf.listTerms is invoked.

The same issue seems to exist in the most recent version of lib/natural/tfidf/tfidf.js as well.

Specifically, line 101 of tfidf.js, reads:

    for(term in this.documents[d]) {

I believe this should be:

    for(var term in this.documents[d]) {

(adding var between for( and term.)

For your convenience, the full context for this line is below:

TfIdf.prototype.listTerms = function(d) {
    var terms = [];

    for(term in this.documents[d]) {
    terms.push({term: term, tfidf: this.tfidf(term, d)})
    }

    return terms.sort(function(x, y) { return y.tfidf - x.tfidf });
}

(These are lines 98-106 of tfidf.js, both in the most recent version of tfidf.js currently in the GitHub repo, as well as the 0.1.19 version distributed thru npm.)

(Thanks for an all-around awesome library, by the way.)

Any plan on support for pure javascript on client side?

I did not go too deep to understand if current version can be used on client side as it is. But given the training data needed I guess not.

Is there any plan on making it available to use in browser rather than though node?

Thanks for your efforts. This will be a useful resource in the future.

Example of Sentence Analyzer Use

Great project! I saw that there is a setence analyzer, with some notes as follows:

Take a POS input and analyse it for

Type of Sentense
- Interrogative
  - Tag Questions
- Declarative
- Exclamatory
- Imperative
Parts of a Sentense
- Subject
- Predicate
Show Preposition Phrases

However I didn't see any examples of using the Sentence analyzer. In particular I am unclear how to generate the necessary POS input. Are there some examples knocking around?

Many thanks in advance

.load + .restore

As also in Readme, it throws an error

natural = require('natural');
var classifier = new natural.BayesClassifier();
classifier.addDocument(['sell', 'gold'], 'sell');
classifier.addDocument(['buy', 'silver'], 'buy');

// serialize
var raw = JSON.stringify(classifier);

// deserialize

var restoredClassifier = natural.BayesClassifier.restore(JSON.parse(raw));
console.log(restoredClassifier.classify('sell'));

It throws the following error

    return this.getClassifications(observation)[0].label; 
                                                  ^
TypeError: Cannot read property 'label' of undefined

Better error messages when .train() hasn't been called on BayesClassifier

If one forgets to call .train() on their classifier, here is the error you get from .classify():

> b.classify('dogs');
TypeError: Cannot read property 'label' of undefined
at [object Object].classify (/home/tlack/apps/classifier/node_modules/natural/node_modules/apparatus/lib/apparatus/classifier/classifier.js:37:51)
at [object Object].classify     (/home/tlack/apps/classifier/node_modules/natural/lib/natural/classifiers/classifier.js:84:28)

This is a pretty common mistake; it might be nice to throw an exception in this case instead of just bombing out.

Add Collocation Support

I was very excited to see this module; I'm in the exact same place, I love nltk, but I'm trying to switch away from Python to Node.js. My most used feature, however, is not available yet in Natural.

Collocations: http://nltk.org/api/nltk.html#module-nltk.collocations

I would love to see Collocations included in this module.

Tokenizer documentation

I am looking for a tokenizer for my short-text categorization application. Natural.js seem to contain a lot of useful tokenizers, however, I find it hard to understand what they do.

For example: what is the "TreebankWordTokenizer"? What is the "Aggressive tokenizer"?

What tokenizer is the most commonly used in text-categorization applications?

funny, but accurate wordnet synonyms

hey, chris, think lookupsynonyms is goofed up a bit.

wordnet.lookupSynonyms('weird', function(results) {
console.log(results)
}

i get outlandishness, strange

which seems right? but

on wordnet i get:
eldritch
uncanny
weird
unearthly

http://www.wordnet-online.com/weird.shtml
http://ubiquity.freebaseapps.com/wordnet?word=weird

same for goofy, sly, most words.

hey, any reason why not to add 'compress' as a dependency?
cheeers

Question: Rate arbitrary English speech as positive/negative

Can I do this with natural? I want my program to know if something that was said on IRC was positive or negative so I can do some internal value adjustments.

Support for german stemmer

Would be great to have a stemmer for the german language.
Maybe this is a good starting point? https://gist.github.com/2199965

Broken requires in tfidf_spec.js

Ran into an issue where natural is breaking my application's test suite because it's behaving differently when the NODE_ENV=test.

This is bad breaking behavior. If natural needs to act differently when it's being tested it should only do so when natural itself is the target of the tests. Otherwise it breaks applications above it. In this case I'm using kue, which relies on reds, which relies on natural.

Bayesian values... too small

I (one more time) but this time I checked 20 times if I was doing something wrong :)

Ok, I'm trying to auto discover spam from my DB. I have a lot of spam (maybe... 800) and about 300 good comments.

I did this script:

var natural = require('natural');
var classifier = new natural.BayesClassifier();

//my own data getter
var db  = require('./libs/db');

var tickets = [];
var trained = false;
var count = 0;
var min = 10; //min manual insertion to spam/not-spam


//this function get ticket, and back to "question" afterward. This is
//there you can look my troubles
function ask(i, ticket, callback) {

    // remove some chars...
    var comment = ticket.comments[i].content;
    comment = (ticket.comments[i].authorlogin || '') + ' ' + (ticket.comments[i].authoremail || '') + ' ' + (ticket.comments[i].authorsite || '') + '\n' + comment;
    comment = comment.toLowerCase();
    comment = comment.replace(/[<>:]/g,' ')
    comment = comment.replace(/[àäâ]+/g,'a');
    comment = comment.replace(/[éêè]+/g,'e');
    comment = comment.replace(/[îï]+/g,'e');
    comment = comment.replace(/[ôö]+/g,'o');
    comment = comment.replace(/[ûüù]+/g,'u');
    comment = comment.replace(/[ŷÿ]+/g,'y');

    // false when bayesian calc is > 0.5
    var shouldtrain = true;
    var m = -1;
    count++;

    // if enought of data, try to auto classify
    if (trained) {
        var cl = classifier.getClassifications(comment);
        for (var c in cl) {
            m = Math.max(m, cl[c].value)
        }
        console.log(m)
        // classifier is not trained, or auto classification was not a success
        if(m > 0.5) {
            // never reached... m is always very small eg. 4.0686547471403846e-39
            // meaning 0,0000000000000000000000000000000000000004068....
            console.log("=== AUTO OK ===");
            var cla = classifier.classify(comment);
            console.log(comment + " :: " + cla + ' -> '+cl[0].value)
            shouldtrain = false;
            classifier.addDocument(comment, cla);
        }
    }

    //train "min" times, then call "train" and save
    if (count > min){ 
        classifier.train();
        trained = true;
        classifier.save('classifier.json', function (){
            console.log('saved')
        })
    }



    // auto get was not success or we haven't enought of datas
    if (shouldtrain) {

        process.stdin.resume();
        process.stdin.write(comment+'\n');
        process.stdin.write("[s]pam, [n]ot-spam:");

        process.stdin.once('data', function(d){
            var d = d.toString().trim();

            switch(d) {
                case "s":
                case "spam":
                    console.log('Set it to spam')
                    ticket.comments[i].spam="spam";
                    classifier.addDocument(comment, "spam")
                    break;

                case "n":
                case "not-spam":
                default:
                    console.log('Set it to non-spam')
                    ticket.comments[i].spam="not-spam";
                    classifier.addDocument(comment, "not-spam")
            }
            callback(ticket,i)
        });
    } else {
        callback(ticket, i)
    }

}

db.query('metal3d','blog',{}, function (doc) { tickets.push(doc)},
function (){
    var i = 0;
    var ticket = null;
    var question = function (ticket, idx){
        // we've got ticket, we are back from "ask" function
        if (ticket != null) {
            // if next comment on ticket
            if (ticket.comments && ticket.comments.length > idx+1 ) {
                ask(idx+1, ticket, question)
            }
            // seek ticket with comment
            else {
                i++;
                while (!tickets[i].comments || tickets[i].comments.length < 1 ) {
                    i++;
                }
                ticket = tickets[i];
                ask(0, ticket, question);
            }
        }
        // no ticket, it's the first call to "ask" function
        else {
            i=0;
            while (!tickets[i].comments.length) {
                i++;
            }
            ticket = tickets[i];
            ask(0, ticket, question)
        }
    }
    // let's start
    question();
});

To be precise. I loop inside my tickets, and for each tickets, each comments, I do this:

if not inserted "min" data, ask if comment is spam or not
- append to the right classification
- go next
else
- try to auto classify
  - if max result is > 0.5 (50%)
    - add document to the classification
    - train()
  - else
    - ask wich classification to use
    - append to the given classification
    - train()

I watch "max" value (and I tried to see the entire getClassifications result) I always have a very little number (4.55566678789e-39).

I'm sure that a lot of spam has repeated values (tramadol, xanax, and multiple time the same sender email)

I'm sure my datas are good (they are displayed) and as you can see, I removed accentuated chars, "<" ans ">" etc... I really tried a lot of possibilities...

If you need my datas to check, I can give you a mongo export and my db.js file.

PS: my db module use last function as callback that is lauched AFTER getting the entire database. There is no asynch calculation is this script.

Best regards

Glossary

Is a glossary-like feature implemented in natural? Couldn't find it in the readme.

Ref: Glossay

Cheers!

unable to open .../WNdb/dict/index.adv

When I perform the following a couple of million times I get the error unable to open .../WNdb/dict/index.adv:

function isWord(text, cb) {
    wordnet.lookup(text, function(results) {
        cb(Array.isArray(results) && results.length > 0);
    });
}

Is there anything I can do to resolve this?

Tokenization with punctuation removed.

Hi Gents,

I've just started looking at this package for node and it looks interesting. I am testing the tokenizer and it works fine but it doesn't seem to strip out commas and other punctuations. I'm using it like so:

var tokenizer = new natural.TreebankWordTokenizer();
tokenizer.tokenize(someString);

Looking at the source code it looks like it should work. Are you aware with any current issues with this?

Cheers.

TF-IDF calculation

In the example, "node" in document 3 should be unadjusted log(5 / 2) or adjusted log(5 / 3). However, it turns to be log(6 / 3) now.

Just remove +1 from line 71 works.
Or, use an unadjusted version:

 67     var docsWithTerm = this.documents.reduce(function(count, document) {
 68         return count + (documentHasTerm(term, document) ? 1 : 0);
 69     }, 0);
 70 
 71     if(docsWithTerm) 
 72         return Math.log(this.documents.length / docsWithTerm);
 73     else
 74         return 0;

Thanks.

Maximum entropy classifier: when will be added to natural?

any plan for add MEC to natural, I'm trying found a node or javascript lib than include it, thanks!!

stemming and classification problems

Here is a whole code:

var natural = require('natural'),
classifier = new natural.BayesClassifier(natural.PorterStemmer);
classifier.addDocument("Master Chief returns in Halo 4, part of a new trilogy in the colossal Halo universe.Set almost five years after the events of Halo 3, Halo 4 takes the series in a new direction and sets the stage for an epic new sci-fi saga, in which the Master Chief returns to confront his destiny and face an ancient evil that threatens the fate of the entire universe. Halo 4 also introduces a new multiplayer offering, called Halo Infinity Multiplayer, that builds off of the Halo franchise's rich multiplayer history. The hub of the Halo 4 multiplayer experience is the UNSC Infinity – the largest starship in the UNSC fleet that serves as the center of your Spartan career. Here you’ll build your custom Spartan-IV supersoldier, and progress your multiplayer career across all Halo 4 competitive and cooperative game modes.", 'halo');
classifier.addDocument("The hunted becomes the hunter in the CryEngine-powered open-world shooter Crysis 3! Players take on the role of 'Prophet' as he returns to New York in the year in 2047, only to discover that the city has been encased in a Nanodome created by the corrupt Cell Corporation. The New York City Liberty Dome is a veritable urban rainforest teeming with overgrown trees, dense swamplands and raging rivers. Within the Liberty Dome, seven distinct and treacherous environments become known as the Seven Wonders. This dangerous new world demands advanced weapons and tactics. Prophet will utilize a lethal composite bow, an enhanced Nanosuit and devastating alien tech to become the deadliest hunter on the planet.Prophet is on a revenge mission after uncovering the truth behind Cell Corporation's motives for building the quarantined Nanodomes. The citizens were told that the giant citywide structures were resurrected to protect the population and to cleanse these metropolises of the remnants of Ceph forces. In reality, the Nanodomes are CELL's covert attempt at a land and technology grab in their quest for global domination. With Alien Ceph lurking around every corner and human enemies on the attack, nobody is safe in the path of vengeance. Everyone is a target in Prophet's quest for retribution.", 'crysis');
classifier.train();


console.log(classifier.classify('nano'));
console.log(classifier.classify('evil'));

Both logs returns "halo", in first case it should return a "crysis". How can I fix it?

Lancaster Stemmer being too aggressive

"things" stems to "th", should be "thing". "living" stems to "l", should be "liv".

Error in isVowel() in double_metaphone.js

Hi, i came across an error in isVowel(c) where c was undefined and so c.match throwed an error:

function isVowel(c) {
    if(!c) return false; // i added this as a workaround
    return c.match(/[aeiouy]/i);
}

This happened when isVowel gets called from handleH on line 223. I think the problem is that token[pos+1] is undefined.

    function handleH() {
        // keep if starts a word or is surrounded by vowels
        if((pos == 0 || isVowel(token[pos - 1])) && isVowel(token[pos + 1])) {
            add('H');
            pos++;
        }
    }

speak easy ?

Hi
Didn't know were to post this but just thought it might be worth noting the existence of this project (maybe joining efforts or something ? )
https://github.com/nhunzaker/speakeasy

NOTE: I am not the maintainer of speakeasy

Wordnet via SQLite

Are you aware that wordnet is readily available in various database formats? ( http://wnsql.sourceforge.net/ ) Sqlite seems an idea one for this application, and may make the job much easier. Would you be open to a pull request that replaces the current implementation with an sqlite-based one?

Russell

Missing letters?

var natural = require('natural');
natural.PorterStemmer.attach();
console.log('President Constitution'.tokenizeAndStem());

returns ['presid','constitut'] . Is this expected result?

Normalizer?

I'd like to contribute a normalizer for Japanese. This is basically a set of replacement functions to normalize Japanese input before further processing. It can also be used to do some conversion (such as full-width <-> half-width characters).

But I'm not sure where to put it, there's no such tool for any languages yet in natural. Should it fit under a new folder at /lib/natural/normalizer?

Please let me know if there's a more appropriate place.

Does natural have sentence segmentation?

Such as this:

https://github.com/louismullie/scalpel/blob/master/lib/scalpel.rb

Levenshtein Distance (Edit distance)

I found a javascript implementation of Levenshtein Distance... I was thinking about integrating it into natural. Any complaints?

http://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance#JavaScript

Error in the classifiers example

Not a big deal, but in the classifiers example on the main page (https://github.com/NaturalNode/natural),

classifier.addDocument('buy the q's', 'buy');

should be:

classifier.addDocument('buy the q"s', 'buy');

TFxIDF - IDF calculation

Hello,
In line 71 at the 'tfidf.js' file, shouldn't it be:

return Math.log(this.documents.length / docsWithTerm /* inited to 1 so no addition needed */);

instead?

(according to http://en.wikipedia.org/wiki/Tf%E2%80%93idf)

Add DoubleMetaphone

Mabye this could be merged into natural? https://github.com/hgoebl/doublemetaphone

make test

what test framework is this? we should add a test target so it's easier

n-grams should support start and end symbols

In text categorization, when using n-grams as features, it is a common practice to add "start" and "end" symbols to the strings. From my experience, it has a significant effect on performance.

For example, when finding bigrams in the sentence "I went home", the result should be:

["[start] I", "I went", "went home", "home [end]"]

0.0.58 Classifiers: Longer training time.

I just updated my code to use 0.0.58 classifiers. Using both Bayes & LogisticRegression, I tried to apply addDocument() times. Then applied '.train()'. In this case, the last train step takes a lot longer to complete than the old version (I let it run for 20 minutes before quitting abruptly).

I've also tried applying train() after each addDocument() call. This method takes 18 minutes for 251 items.

The older version could train about 5k in under 10 minutes on the same hardware. Is there something else I should be doing to obtain the same performance as the older version?

"Victorys" instead of "Victories"

I tested the natural.NounInflector.pluralizeNoun() with "Victory", "Party" and "Repository" and the output was always wrong.

Example

var natural    = require('natural'),
nounInflector = new natural.NounInflector;
nounInflector.attach();

console.log("Victory".pluralizeNoun());

Irregular inflections don't inherit case

If I add an irregular inflection to the NounInflector, it does not carry over the case of the inflected word. For example:

var inflector = new NounInflector();
inflector.addIrregular('survey', 'surveys');

var plural = inflector.pluralize('Survey');
// `plural` == 'surveys'

plural = inflector.pluralize('SURVEY');
// `plural` == 'surveys'

Indeed, there's a case which specifies this as being expected behaviour in noun_inflector_spec.js:

javascript
expect(inflector.pluralize('ox')).toBe('oxen');
expect(inflector.pluralize('OX')).toBe('oxen');


Should it not attempt to inherit capitalization & uppercase?

Jaro-Winkler Distance Algorithm Match Value Question

Firstly, a massive thank-you for putting this library together. It's been really helpful implementing some text similarity behaviour in an application I'm currently writing.

Secondly, I just wanted to ask a question regarding the match value that is returned from the JaroWinklerDistance algorithm. On reading the wikipedia article I definitely got the impression that a comparison between two strings that are exactly the same should return a result of 1. I'm finding, however, that this isn't the case.

In the application I'm currently writing, I'm using the functionality to order place names in order of their match strength against an original request. In the case of sydney, this works as expected:

> natural.JaroWinklerDistance('sydney', 'sydney');
1

However, when I did a comparison for seddon against seddon, it returns less than 1:

> natural.JaroWinklerDistance('seddon', 'seddon');
0.8933333333333334

I went on to do some other smaller tests in the node console to see if I could have the function produce a higher score for two different strings than two that were exactly the same, as realistically, this is the case that I'm worried might occur. After a little bit of playing around I found that I could:

> natural.JaroWinklerDistance('abc', 'abc');
0.8666666666666666
> natural.JaroWinklerDistance('abcd', 'abcd');
1
> natural.JaroWinklerDistance('abcd', 'abc');
0.9416666666666667

Once I grok the algorithm, I'll have a look at forking the code and seeing if I can work out where this is happening, but I thought I'd raise an issue first to see if it was something that someone else knew how to fix quickly and simply.

Thanks again for your efforts on the library.

Cheers,
Damon.

Consume Natural from a Chrome extension

Hello,
Since Chrome extensions are written in javascript, is it possible to use Natural in them? If so how should it be consumed?
Thanks

find if a word is misspelled

Hello & congrads for your code work. I have begin to use the natural and I want to ask if is any way to find if a word is misspelled or if a word is an english word, Any suggestions?

Multiple download of WordNet DB files

When a WordNet api is called first time, the program tries to download the DB files. If multiple api are called in fast sequence, multiple download requests get queued. This resulted in truncated files for me and hanging api calls.

WordNet 3.0 raw data files are now available as an npm module WNdb. This can be installed with a 'npm install WNdb' or added to the dependency list.

I think it'd be more robust to get the files at install time, or have the user manually install if size (about 10MB compressed) is an issue.

Retrieving keywords from domains (or text without delimiters)

I have a list of domain names and would like to extract the keywords that exist in these names. For example:

expertsexchange.com

experts exchange

penisland.com

pen island

choosespain.com

choose spain

kidsexpress.com

kids express

childrenswear.com

childrens wear

dicksonweb

dickson web

Much like:

Would natural be able to help me with this?

leaked global

https://github.com/NaturalNode/natural/blob/master/lib/natural/stemmers/porter_stemmer.js#L109

Named entity recognition

Do you have any plans for named entity recognition, I have seen that it would require a sequential classifier. The ability to train it with your own data set (json document) of POS tags and other key attributes.

naturalnode / natural Goto Github PK

natural's People

Contributors

Stargazers

Watchers

Forkers

natural's Issues

- Tag Questions

Recommend Projects

Recommend Topics

Recommend Org