cbg-ethz / bmi Goto Github PK

View Code? Open in Web Editor NEW

23.0 4.0 0.0 1.04 MB

Mutual information estimators and benchmark

Home Page: https://cbg-ethz.github.io/bmi/

License: MIT License

Python 94.33% TeX 5.67%

benchmark estimator mutual-information python

bmi's Introduction

Benchmarking Mutual Information

BMI is the package for estimation of mutual information between continuous random variables and testing new estimators.

Documentation: https://cbg-ethz.github.io/bmi/
Source code: https://github.com/cbg-ethz/bmi
Bug reports: https://github.com/cbg-ethz/bmi/issues
PyPI package: https://pypi.org/project/benchmark-mi

Getting started

While we recommend taking a look at the documentation to learn about full package capabilities, below we present the main capabilities of the Python package. (Note that BMI can also be used to test non-Python mutual information estimators.)

You can install the package using:

$ pip install benchmark-mi

Alternatively, you can use the development version from source using:

$ pip install "bmi @ https://github.com/cbg-ethz/bmi"

Note: BMI uses JAX and by default installs the CPU version of it. If you have a device supporting CUDA, you can install the CUDA version of JAX.

Now let's take one of the predefined distributions included in the benchmark (named "tasks") and sample 1,000 data points. Then, we will run two estimators on this task.

import bmi

task = bmi.benchmark.BENCHMARK_TASKS['1v1-normal-0.75']
print(f"Task {task.name} with dimensions {task.dim_x} and {task.dim_y}")
print(f"Ground truth mutual information: {task.mutual_information:.2f}")

X, Y = task.sample(1000, seed=42)

cca = bmi.estimators.CCAMutualInformationEstimator()
print(f"Estimate by CCA: {cca.estimate(X, Y):.2f}")

ksg = bmi.estimators.KSGEnsembleFirstEstimator(neighborhoods=(5,))
print(f"Estimate by KSG: {ksg.estimate(X, Y):.2f}")

Evaluating a new estimator

The above code snippet may be convenient for estimating mutual information on a given data set or for the development of a new mutual information estimator. However, for extensive benchmarking it may be more convenient to use one of the benchmark suites available in the workflows/benchmark/ subdirectory.

For example, you can install Snakemake and run a small benchmark suite on several estimators using:

$ snakemake -c4 -s workflows/benchmark/demo/run.smk

In about a minute it should generate minibenchmark results in the generated/benchmark/demo directory. Note that the configuration file, workflows/benchmark/demo/config.py, explicitly defines the estimators and tasks used, as well as the number of samples.

Hence, it is easy to benchmark a custom estimator by importing it and including it in the configuration dictionary. More information is available here, where we cover evaluating new Python as well as non-Python estimators.

Similarly, it is easy to change the number of samples or adjust the tasks included in the benchmark. We defined several benchmark suites with shared structure.

List of implemented estimators

(Your estimator can be here too! Please, reach out to us if you would like to contribute.)

The neighborhood-based KSG estimator proposed in Estimating Mutual Information by Kraskov et al. (2003).
Donsker-Varadhan and MINE estimators proposed in MINE: Mutual Information Neural Estimation by Belghazi et al. (2018).
InfoNCE estimator proposed in Representation Learning with Contrastive Predictive Coding by Oord et al. (2018).
NWJ estimator proposed in Estimating divergence functionals and the likelihood ratio by convex risk minimization by Nguyen et al. (2008).
Estimator based on canonical correlation analysis described in Feature discovery under contextual supervision using mutual information by Kay (1992) and in Some data analyses using mutual information by Brillinger (2004).

Citing

If you find this code useful in your research, consider citing our manuscript:

@inproceedings{beyond-normal-2023,
 title = {Beyond Normal: On the Evaluation of Mutual Information Estimators},
 author = {Czy\.{z}, Pawe{\l}  and Grabowski, Frederic and Vogt, Julia and Beerenwinkel, Niko and Marx, Alexander},
 booktitle = {Advances in Neural Information Processing Systems},
 editor = {A. Oh and T. Neumann and A. Globerson and K. Saenko and M. Hardt and S. Levine},
 pages = {16957--16990},
 publisher = {Curran Associates, Inc.},
 url = {https://proceedings.neurips.cc/paper_files/paper/2023/file/36b80eae70ff629d667f210e13497edf-Paper-Conference.pdf},
 volume = {36},
 year = {2023}
}

bmi's People

Contributors

Stargazers

Watchers

bmi's Issues

1D + 1D visualisation

Utilities to visualise the joint distribution of two 1-dimensional variables. Probably a thin wrapper around Seaborn.

Better tests for estimators

it would be nice to organize our estimator tests, that is have a generic testing function:

test_estimator_on_task(estimator, task, n_samples, seed, abs_error, rel_error)

and then use it to build our tests:

a generic/easy group for all estimators
R and julia estimators can be optionally tested (with the same tests?)
longer/advanced tests for estimators we expect to perform well (esp KSG and neural estimators)

Update github readme

Include info about available estimators
Show basic examples of running a given estimator on a given task

Change the package API

This PR proposes how to refactor the package, so it's easier to use.

Benchmark tasks

Instead of generated values for a range of seeds, tasks will be more like a named sampler. Currently tasks are objects with several samples.

xs, ys = task.sample(n_samples=5000, seed=42)

task.id  # unique
mn_sparse_3x3

> task.name  # pretty
Multinormal (sparse) 3 × 3

> task.params
(serializible info about the task)

> task.save_metadata('path/to/save.yaml')

> task.save_sample('path/to/save.csv', n_samples=5000, seed=42)
(includes info from above)

> from bmi.tasks import read_sample
> x, y = read_sample('path/to/read')

Dumping task could be a functionality in benchmark, for example:

> from bmi.tasks import dump_task
> dump_task('path/', task, seeds=[0, 1, 2], samples=[1000, 2000])

should create:

path/
  task_id/
    metadata.yaml
    samples/
      1000-0.csv
      1000-1.csv
      1000-2.csv
      2000-0.csv
      2000-1.csv
      2000-2.csv

We need an official dictionary of tasks, BENCHMARK_TASKS:

> from bmi.benchmark import BENCHMARK_TASKS
> task = BENCHMARK_TASKS['some_task_id']

We can have a script for non-Python users that allows
easy task generation:

$ python generate_task.py TASK_ID SEEDS SAMPLES PATH

Estimators

We wrap external estimators so they behave like regular estimators by on-the-fly saving the needed sample (ideally in /run/user/$uid or tmp/ or some other ramdisk).

> from bmi.estimators import InfoNCEEstimator
> from bmi.estimators import JuliaTransferEstimator
> from bmi.benchmark import BENCHMARK_TASKS
> task = BENCHMARK_TASKS['some_task_id']
> xs, ys = task.sample(5000, 0)
> InfoNCEEstimator.estimate(xs, ys)
> JuliaTransferEstimator.estimate(xs, ys)

Benchmark

We want benchmarks to be easily run and configured through Snakemake. This is out of scope for this issue, but we should keep it in mind.

Rename fine distributions to BMMs

This PR is, as proposed by @grfrederic, about updating naming conventions.

Tasks:

Change the import in the package. (from ... import ... as bmm rather than from ... import ... as fine)
Update unit tests.
Update Snakemake workflows.
Update the documentation: check if API is rendered properly.
Update the documentation: adjust the tutorial.

Inconsistent `.mutual_information` method

In our samplers .mutual_information() is a callable method, in tasks it is a @property. It would be nice for them to be consistent.

Implement Geometric KNN estimator

Implement the G-KNN estimator.

Implement LNC estimator

Implement LNC estimator, which is KSG with additional correction terms.

Add dropout to neural estimators

MINE, InfoNCE, and other neural estimators could use random state to use dropout. This requires changes in the training loop and some refactoring.

Adaptive histograms

Implement an adaptive binning strategy for the histogram-based MI estimator. Also, consider binning by the number of samples and estimating the volume, rather than current strategy (equally-sized bins and different number of samples).

Minibenchmarks

Minibenchmarks/specific problems one can encounter

Sparsity of interactions.
Spiral plots:
- We know that task is getting harder at harder speed, performance drops.
- We compare performance of different estimators with each other.
- Mention that it's probably of breaking colinearities and neighborhoods and PDF is more tricky.
  - Add to discussion: neural estimators may not model hard density functions.
High MI hard to estimate.
Do tails matter?
- Apply |x|^(1+a) homeomorphism and change a. For each a consider a distribution and its asynhed, uniformized, and "standardization" versions.
- See what happens for Student t with different number of degrees of freedom
How to normalize?
- Asynhed, uniformized, and standardization (maybe not with normalizing flow, but with uniformization and then CDFitaion via normal along each axis)

Generated results:

npoints -> estimatator -> task -> MI estimate
- what we plot: estimator -> task -> f(npoints, MI estimate)
npoints = 5k, estimator, task, preprocessing -> MI estimate
- some estimators, some tasks

Do tails matter?
- Student t vs normal
- Half-cube
- "Detailing" with async

Figures order

One proposition

Demonstration of distributions
Benchmark figure
Specific issues ("minibenchmarks")

Another one

Demonstration of distributions
Specific issues
Benchmark figure

Implement early stopping for neural estimators

The default setup for neural estimators should be training on a 50-50 train/test split, early stopping when test MI stops growing, and return the highest test MI as the estimate.

Conditional mutual information

Add conditional MI estimators and samplers.

Note that we have the chain rule:
$$I(X; Y, Z) = I(X; Z) + I(X; Y\mid Z).$$

Error raised during smoothing if training is too short

There is a bug in smoothing the training:

src/bmi/estimators/neural/_mine_estimator.py:357: in estimate
    return self.estimate_with_info(x, y).mi_estimate
src/bmi/estimators/neural/_mine_estimator.py:335: in estimate_with_info
    training_log, trained_critic = mine_training(
src/bmi/estimators/neural/_mine_estimator.py:248: in mine_training
    training_log.finish()
src/bmi/estimators/neural/_training_log.py:107: in finish
    self.detect_warnings()
src/bmi/estimators/neural/_training_log.py:120: in detect_warnings
    train_mi_smooth = (cs[w:] - cs[:-w]) / w
jax/_src/numpy/lax_numpy.py:5071: in deferring_binary_op
    return binary_op(*args)

which arises in the settings when the training is too short. I added a TODO in _training_log.py:

        # TODO(Pawel, Frederic): If training smooth window is too
        #   long we will have an error that subtraction between (n,)
        #   and (0,) arrays cannot be performed.
        train_mi_smooth = (cs[w:] - cs[:-w]) / w

Changes to the manuscript

Use the NeurIPS template.
Cite the variational estimators literature (Poole et al., Song and Ermon, McAllester and Stratos)
Benchmark with tasks created using the BMM models to have a table in the appendix.
Make Section 4 a Subsection 3.4.
Revisit the introduction.
Revisit the discussion and link to the appendix for more results.
Answer NeurIPS checklist questions.
Update the arXiv version.

Add MI estimators in R

As @a-marx told me, KSG, G-KSG, gKNN, and LNN are implemented in this repository. For the demo, look here, at lines 97–113.

We can add it as a git submodule and plug it into our framework by creating an appropriate wrapper script in R. It it probably the best to parametrize it with argparse.

Add difference of cross-entropies estimator

Add the difference of cross-entropy estimator.

Improve the documentation

There's room for improvement for documentation:

Add a picture of the benchmark to the ReadMe/docs.
Explicitly list estimators and cite their resources (See #135). Some ideas:
- Add estimator.cite() method.
- Add the citations to the documentation (e.g., to the webpage listing the existing estimators by including them in the docstrings).

Additionally, the following tutorial sections in documentation would be useful to add:

How to use the samplers
How to use the tasks.
How to use the estimators. (See #135)
How to add a new estimator. (See #134, #135)
How to define and use the fine distributions. (See #138)
How to use Snakemake workflows.

A possible suggestion how to structure things: https://omnibenchmark.org/

Add the tasks based on fine distributions

Add some tasks to the benchmark based on the fine distributions framework.

For example, tasks involving outliers and discrete-continuous distributions.
Note that some more thinking is needed what exactly tasks to add.

Add an MI estimator in Julia

To emulate the experience from the Julia users perspective, create a simple Julia script reading the data in the provided format and running some implemented MI estimators

Clean up imports

We've made a lot of changes when moving to our new tasks/benchmark API. It would be nice to rethink which tasks/estimators etc should be exported by default. For example:

when importing bmi.benchmark, should functions for creating tasks (which live in bmi.benchmark.tasks) be reexported there and users create the tasks themselves, or should the tasks from the benchmark list be reexported under convenient names?
should external estimators be exported separately, or included in bmi.estimators? we could go with the latter and raise warnings when someone tries to initialize an estimator and R/julia/some necessary packages is not installed.

Principled approach to handling NaNs

In #110 there was an issue that NaNs are raised.

I think this may be a problem of numerical approximations: sometimes we may have $p(x, y)\approx 0$, so that numerically $\log p(x, y) = -\text{inf}$.

If all $\log p(x, y)$, $\log p(x)$, and $\log p(y)$ evaluate to $-\text{inf}$, then PMI evaluates to NaN.

I asked ChatGPT and it sugggested to use the following construction:

import jax.numpy as jnp

def custom_subtract(a, b, c):
    # Calculate the result for a - (b + c)
    result = a - (b + c)

    # Create a mask for the special case when all inputs are -inf
    mask = jnp.logical_and(jnp.logical_and(a == -jnp.inf, b == -jnp.inf), c == -jnp.inf)

    # Return 0 where the mask is True, and the original result otherwise
    return jnp.where(mask, 0.0, result)

# Test
a = jnp.array(-float('inf'))
b = jnp.array(-float('inf'))
c = jnp.array(-float('inf'))

print(custom_subtract(a, b, c))  # Should print 0.0

Implement G-KSG

G-KSG is a geodesic variant of KSG, based on geodesic random forests to find the neighborhoods.
There exists an R implementation.

Fix importing

so we don't have to manually import like this:
import bmi.samplers.SplitMultinormal

Implement multivariate Student-t sampler

Student-t distribution has its multivariate generalization (which also for $\nu \gg 2$ is similar to the normal).

Sampling can be efficiently implemented as described here.
Mutual information can be calculated analytically, as described in
R.B. Arellano-Valle et al., Shannon Entropy and Mutual Information for Multivariate Skew-Elliptical Distributions, Scandinavian Journal of Statistics, Vol. 40, No. 1 (March 2013), p. 47.

Moreover, the mentioned article describes MI of several other families of distributions, although sampling from them may be tricky and MI calculation may require numerical integration.

Wrap the estimators to make them easier to run

Currently we have an interface for Python estimators taking the X and Y samples and a ExternalEstimator class which takes the path to the task.
The latter one is very convenient when one loads the tasks from the disk. It'd be good to abstract it into an interface and make the existing Python estimators implement such interface.

Modify the estimator interface to provide parameters.
Adjust the existing implementations to provide the parameters.
Define a new interface ITaskEstimator, for returning the parameters and estimating the MI using the loaded data.
Adjust ExternalEstimator, so it implements the ITaskEstimator interface.
Create a factory method which takes the estimator and generates wraps it into a ITaskEstimator implementation.

Benchmark versioning

Introduce principled versioning of the benchmark, using GitHub releases.

Additional changes:

Version number in Python code or ReadMe?
Draft the v1.0 release, when it's done.

Standardize key/seed parameters for samplers

Normal sampler takes jax key, lets just allow for an int and cast it to jax key manually.

Allow for Python 3.11 and 3.12

Python 3.12 was released in October 2023 and we still currently have the following pins in pyproject.toml:

# <3.11 because of PyType. Update when it's resolved
# <3.12 because of SciPy. Update when it's resolved
python = ">=3.9,<3.11"

The idea for this issue is to update the dependencies, so that they work with Python 3.11 and 3.12. It's also likely that we can drop 3.9 entirely.

Tasks:

Update pyproject.toml, resolving the dependencies appropriately.
Update .github/workflows, so that we test against 3.11 and 3.12.

Implement Mutual Information Neural Estimation

It would be nice to have a JAX (+Equinox) based implementation of Mutual Information Neural Estimation. A PyTorch version can be found, e.g., in the Latte project.

Tests for the Spiral

In #23 we have an example of the spiraling diffeomorphism. Think whether the API is right and write the tests (currently they are missing).