InfoR is a Python package for Information Retrieval. Information retrieval means given a set of (text/html/xml) documents, extract the documents which are most relevant to a seach query. You search engine e.g. Google is a retrieval system.
InfoR has support for 3 types of retrieval systems :
- Vector Space Models
- Language Models
- Probabilitistic Models
For more information (no pun intended!) on these models see http://nlp.stanford.edu/IR-book/
Google uses PageRank algorithm which exploits the hyperlinks in an html document. This package currently works only for a corpus of text documents. I'm hoping to add html/xml support also and hopefully include an implementation of PageRank.
Download : https://pypi.python.org/pypi/infor/
Installation : pip install infor
Dependencies:
Documentation: https://pythonhosted.org/infor/
Usage:
from InfoR.VectorSpaceModels import VSM, LanguageModel, ProbModel
vector space mode
out = VSM(corpus)
out.search(query, number_of_docs_to_be_returned, tf_idf=True, LSA=True, n_comp=3)
language model
out = LanguageModel(corpus)
out.search(query, number_of_docs_to_be_returned)
probabistic model
out = ProbModel(corpus)
out.search(query, number_of_docs_to_be_returned)