Giter Club home page Giter Club logo

nlp's Introduction

NLP

Practice and examples of using nltk library for NLP

  • Corpus
    A large body of natural language text used for accumulating statistics on natural language text. The plural is corpora.
  • Lexicon
    A lexicon is a collection of information about the words of a language about the lexical categories to which they belong. A lexical entry will include further information about the roles the word plays.
    Example : BULL means an animal in english also the rise or positive for an investor.
  • Tokenization
    Splitting sentences (sentence tokenizer) and words (word tokenizer) from the body of text.
  • StopWords
    Words that are useless, and we wish to do nothing with them. So they are removed from text.
  • Stemming
    Normalization, in terms of affixes involved with words.
    Example : riding === ride , normalization with -ing affix.
    Algorithms involved in stemming are PorterStemmer, LancasterStemmer, SnowballStemmer
  • Lemmatizing
    Similar to stemming, Stemming can often create non-existent words, whereas lemmas are actual words.
    Stemmed word may not be something you can just look up in a dictionary, but you can look up a lemma.
  • POS - Part Of Speech tagging
    Labeling words in a sentence as nouns, adjectives, verbs...etc. along with tense forms.
    For complete list of POS tags refer to nlpfile.py above.
  • NER - Named Entity Recognition
    Pull out entities like people, places, things, locations, monetary figures etc.
  • Chunking
    To group the words in text based on Nouns(generally), Verbs etc. , to have an idea what the sentence is about.
  • Chinking
    Chunk without the Chink, ie to group except certain parts of speech.

nlp's People

Contributors

kapoor-rakshit avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.