Giter Club home page Giter Club logo

npeet's Introduction

NPEET

Non-parametric Entropy Estimation Toolbox

This package contains Python code implementing several entropy estimation functions for both discrete and continuous variables. Information theory provides a model-free way find structure in complex systems, but difficulties in estimating these quantities has traditionally made these techniques infeasible. This package attempts to allay these difficulties by making modern state-of-the-art entropy estimation methods accessible in a single easy-to-use python library.

The implementation is very simple. It only requires that numpy/scipy be installed. It includes estimators for entropy, mutual information, and conditional mutual information for both continuous and discrete variables. Additionally it includes a KL Divergence estimator for continuous distributions and mutual information estimator between continuous and discrete variables along with some non-parametric tests for evaluating estimator performance.

The main documentation is in npeet_doc.pdf. It includes description of functions, references, implementation details, and technical discussion about the difficulties in estimating entropies. The code is available here. It requires scipy 0.12 or greater. This package is mainly geared to estimating information-theoretic quantities for continuous variables in a non-parametric way. If your primary interest is in discrete entropy estimation, particularly with undersampled data, please see this package.

Example installation and usage:

git clone https://github.com/gregversteeg/NPEET.git

>>> import entropy_estimators as ee
>>> x = [[1.3],[3.7],[5.1],[2.4],[3.4]]
>>> y = [[1.5],[3.32],[5.3],[2.3],[3.3]]
>>> ee.mi(x,y)
Out: 0.168

Another example:

import numpy as np
import entropy_estimators as ee

my_data = np.genfromtxt('my_file.csv', delimiter=',')  # If you look in the documentation, there is a way to skip header rows and other things

x = my_data[:,[5]].tolist()
y = my_data[:,[9]].tolist()
z = my_data[:,[15,17]].tolist()
print ee.cmi(x, y, z)
print ee.shuffle_test(ee.cmi, x, y, z, ci=0.95, ns=1000)

This prints the mutual information between column 5 and 9, conditioned on columns 15 and 17. You can also use the function shuffle_test to return confidence intervals for any estimator. Shuffle_test returns the mean CMI under the null hypothesis (CMI=0), and 95% confidence intervals, estimated using 1000 random permutations of the data. Note that we converted the numpy arrays to lists! The current version really works only on python lists (lists of lists actually, as in the first example.

See documentation for references on all implemented estimators.

			A Kraskov, H Stögbauer, P Grassberger. 
			http://pre.aps.org/abstract/PRE/v69/i6/e066138
			Estimating Mutual Information
			PRE 2004.

			Greg Ver Steeg and Aram Galstyan 
			http://lanl.arxiv.org/abs/1208.4475
			Information-Theoretic Measures of Influence Based on Content Dynamics
			WSDM, 2013.

			Greg Ver Steeg and Aram Galstyan 
			http://arxiv.org/abs/1110.2724 
			Information Transfer in Social Media
			WWW, 2012.

npeet's People

Contributors

gregversteeg avatar naught101 avatar

Watchers

 avatar James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.