Giter Club home page Giter Club logo

bmi's Introduction

arXiv Venue Project Status: Active – The project has reached a stable, usable state and is being actively developed. PyPI Latest Release build Code style: black

Benchmarking Mutual Information

BMI is the package for estimation of mutual information between continuous random variables and testing new estimators.

Getting started

While we recommend taking a look at the documentation to learn about full package capabilities, below we present the main capabilities of the Python package. (Note that BMI can also be used to test non-Python mutual information estimators.)

You can install the package using:

$ pip install benchmark-mi

Alternatively, you can use the development version from source using:

$ pip install "bmi @ https://github.com/cbg-ethz/bmi"

Note: BMI uses JAX and by default installs the CPU version of it. If you have a device supporting CUDA, you can install the CUDA version of JAX.

Now let's take one of the predefined distributions included in the benchmark (named "tasks") and sample 1,000 data points. Then, we will run two estimators on this task.

import bmi

task = bmi.benchmark.BENCHMARK_TASKS['1v1-normal-0.75']
print(f"Task {task.name} with dimensions {task.dim_x} and {task.dim_y}")
print(f"Ground truth mutual information: {task.mutual_information:.2f}")

X, Y = task.sample(1000, seed=42)

cca = bmi.estimators.CCAMutualInformationEstimator()
print(f"Estimate by CCA: {cca.estimate(X, Y):.2f}")

ksg = bmi.estimators.KSGEnsembleFirstEstimator(neighborhoods=(5,))
print(f"Estimate by KSG: {ksg.estimate(X, Y):.2f}")

Evaluating a new estimator

The above code snippet may be convenient for estimating mutual information on a given data set or for the development of a new mutual information estimator. However, for extensive benchmarking it may be more convenient to use one of the benchmark suites available in the workflows/benchmark/ subdirectory.

For example, you can install Snakemake and run a small benchmark suite on several estimators using:

$ snakemake -c4 -s workflows/benchmark/demo/run.smk

In about a minute it should generate minibenchmark results in the generated/benchmark/demo directory. Note that the configuration file, workflows/benchmark/demo/config.py, explicitly defines the estimators and tasks used, as well as the number of samples.

Hence, it is easy to benchmark a custom estimator by importing it and including it in the configuration dictionary. More information is available here, where we cover evaluating new Python as well as non-Python estimators.

Similarly, it is easy to change the number of samples or adjust the tasks included in the benchmark. We defined several benchmark suites with shared structure.

List of implemented estimators

(Your estimator can be here too! Please, reach out to us if you would like to contribute.)

Citing

If you find this code useful in your research, consider citing our manuscript:

@inproceedings{beyond-normal-2023,
 title = {Beyond Normal: On the Evaluation of Mutual Information Estimators},
 author = {Czy\.{z}, Pawe{\l}  and Grabowski, Frederic and Vogt, Julia and Beerenwinkel, Niko and Marx, Alexander},
 booktitle = {Advances in Neural Information Processing Systems},
 editor = {A. Oh and T. Neumann and A. Globerson and K. Saenko and M. Hardt and S. Levine},
 pages = {16957--16990},
 publisher = {Curran Associates, Inc.},
 url = {https://proceedings.neurips.cc/paper_files/paper/2023/file/36b80eae70ff629d667f210e13497edf-Paper-Conference.pdf},
 volume = {36},
 year = {2023}
}

bmi's People

Contributors

grfrederic avatar pawel-czyz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

bmi's Issues

1D + 1D visualisation

Utilities to visualise the joint distribution of two 1-dimensional variables. Probably a thin wrapper around Seaborn.

Better tests for estimators

it would be nice to organize our estimator tests, that is have a generic testing function:

test_estimator_on_task(estimator, task, n_samples, seed, abs_error, rel_error)

and then use it to build our tests:

  1. a generic/easy group for all estimators
  2. R and julia estimators can be optionally tested (with the same tests?)
  3. longer/advanced tests for estimators we expect to perform well (esp KSG and neural estimators)

Update github readme

  1. Include info about available estimators
  2. Show basic examples of running a given estimator on a given task

Change the package API

This PR proposes how to refactor the package, so it's easier to use.

Benchmark tasks

Instead of generated values for a range of seeds, tasks will be more like a named sampler. Currently tasks are objects with several samples.

xs, ys = task.sample(n_samples=5000, seed=42)

task.id  # unique
mn_sparse_3x3

> task.name  # pretty
Multinormal (sparse) 3 × 3

> task.params
(serializible info about the task)

> task.save_metadata('path/to/save.yaml')

> task.save_sample('path/to/save.csv', n_samples=5000, seed=42)
(includes info from above)

> from bmi.tasks import read_sample
> x, y = read_sample('path/to/read')

Dumping task could be a functionality in benchmark, for example:

> from bmi.tasks import dump_task
> dump_task('path/', task, seeds=[0, 1, 2], samples=[1000, 2000])

should create:

path/
  task_id/
    metadata.yaml
    samples/
      1000-0.csv
      1000-1.csv
      1000-2.csv
      2000-0.csv
      2000-1.csv
      2000-2.csv

We need an official dictionary of tasks, BENCHMARK_TASKS:

> from bmi.benchmark import BENCHMARK_TASKS
> task = BENCHMARK_TASKS['some_task_id']

We can have a script for non-Python users that allows
easy task generation:

$ python generate_task.py TASK_ID SEEDS SAMPLES PATH

Estimators

We wrap external estimators so they behave like regular estimators by on-the-fly saving the needed sample (ideally in /run/user/$uid or tmp/ or some other ramdisk).

> from bmi.estimators import InfoNCEEstimator
> from bmi.estimators import JuliaTransferEstimator
> from bmi.benchmark import BENCHMARK_TASKS
> task = BENCHMARK_TASKS['some_task_id']
> xs, ys = task.sample(5000, 0)
> InfoNCEEstimator.estimate(xs, ys)
> JuliaTransferEstimator.estimate(xs, ys)

Benchmark

We want benchmarks to be easily run and configured through Snakemake. This is out of scope for this issue, but we should keep it in mind.

Rename fine distributions to BMMs

This PR is, as proposed by @grfrederic, about updating naming conventions.

Tasks:

  • Change the import in the package. (from ... import ... as bmm rather than from ... import ... as fine)
  • Update unit tests.
  • Update Snakemake workflows.
  • Update the documentation: check if API is rendered properly.
  • Update the documentation: adjust the tutorial.

Add dropout to neural estimators

MINE, InfoNCE, and other neural estimators could use random state to use dropout. This requires changes in the training loop and some refactoring.

Adaptive histograms

Implement an adaptive binning strategy for the histogram-based MI estimator. Also, consider binning by the number of samples and estimating the volume, rather than current strategy (equally-sized bins and different number of samples).

Minibenchmarks

Minibenchmarks/specific problems one can encounter

  • Sparsity of interactions.
  • Spiral plots:
    • We know that task is getting harder at harder speed, performance drops.
    • We compare performance of different estimators with each other.
    • Mention that it's probably of breaking colinearities and neighborhoods and PDF is more tricky.
      • Add to discussion: neural estimators may not model hard density functions.
  • High MI hard to estimate.
  • Do tails matter?
    • Apply |x|^(1+a) homeomorphism and change a. For each a consider a distribution and its asynhed, uniformized, and "standardization" versions.
    • See what happens for Student t with different number of degrees of freedom
  • How to normalize?
    • Asynhed, uniformized, and standardization (maybe not with normalizing flow, but with uniformization and then CDFitaion via normal along each axis)

Generated results:

  • npoints -> estimatator -> task -> MI estimate
    • what we plot: estimator -> task -> f(npoints, MI estimate)
  • npoints = 5k, estimator, task, preprocessing -> MI estimate
    • some estimators, some tasks
  1. Do tails matter?
    • Student t vs normal
    • Half-cube
    • "Detailing" with async

Figures order

One proposition

  • Demonstration of distributions
  • Benchmark figure
  • Specific issues ("minibenchmarks")

Another one

  • Demonstration of distributions
  • Specific issues
  • Benchmark figure

Conditional mutual information

Add conditional MI estimators and samplers.

Note that we have the chain rule:
$$I(X; Y, Z) = I(X; Z) + I(X; Y\mid Z).$$

Error raised during smoothing if training is too short

There is a bug in smoothing the training:

src/bmi/estimators/neural/_mine_estimator.py:357: in estimate
    return self.estimate_with_info(x, y).mi_estimate
src/bmi/estimators/neural/_mine_estimator.py:335: in estimate_with_info
    training_log, trained_critic = mine_training(
src/bmi/estimators/neural/_mine_estimator.py:248: in mine_training
    training_log.finish()
src/bmi/estimators/neural/_training_log.py:107: in finish
    self.detect_warnings()
src/bmi/estimators/neural/_training_log.py:120: in detect_warnings
    train_mi_smooth = (cs[w:] - cs[:-w]) / w
jax/_src/numpy/lax_numpy.py:5071: in deferring_binary_op
    return binary_op(*args)

which arises in the settings when the training is too short. I added a TODO in _training_log.py:

        # TODO(Pawel, Frederic): If training smooth window is too
        #   long we will have an error that subtraction between (n,)
        #   and (0,) arrays cannot be performed.
        train_mi_smooth = (cs[w:] - cs[:-w]) / w

Changes to the manuscript

  • Use the NeurIPS template.
  • Cite the variational estimators literature (Poole et al., Song and Ermon, McAllester and Stratos)
  • Benchmark with tasks created using the BMM models to have a table in the appendix.
  • Make Section 4 a Subsection 3.4.
  • Revisit the introduction.
  • Revisit the discussion and link to the appendix for more results.
  • Answer NeurIPS checklist questions.
  • Update the arXiv version.

Add MI estimators in R

As @a-marx told me, KSG, G-KSG, gKNN, and LNN are implemented in this repository. For the demo, look here, at lines 97–113.

We can add it as a git submodule and plug it into our framework by creating an appropriate wrapper script in R. It it probably the best to parametrize it with argparse.

Improve the documentation

There's room for improvement for documentation:

  • Add a picture of the benchmark to the ReadMe/docs.
  • Explicitly list estimators and cite their resources (See #135). Some ideas:
    • Add estimator.cite() method.
    • Add the citations to the documentation (e.g., to the webpage listing the existing estimators by including them in the docstrings).

Additionally, the following tutorial sections in documentation would be useful to add:

  • How to use the samplers
  • How to use the tasks.
  • How to use the estimators. (See #135)
  • How to add a new estimator. (See #134, #135)
  • How to define and use the fine distributions. (See #138)
  • How to use Snakemake workflows.

A possible suggestion how to structure things: https://omnibenchmark.org/

Clean up imports

We've made a lot of changes when moving to our new tasks/benchmark API. It would be nice to rethink which tasks/estimators etc should be exported by default. For example:

  1. when importing bmi.benchmark, should functions for creating tasks (which live in bmi.benchmark.tasks) be reexported there and users create the tasks themselves, or should the tasks from the benchmark list be reexported under convenient names?
  2. should external estimators be exported separately, or included in bmi.estimators? we could go with the latter and raise warnings when someone tries to initialize an estimator and R/julia/some necessary packages is not installed.

Principled approach to handling NaNs

In #110 there was an issue that NaNs are raised.

I think this may be a problem of numerical approximations: sometimes we may have $p(x, y)\approx 0$, so that numerically $\log p(x, y) = -\text{inf}$.

If all $\log p(x, y)$, $\log p(x)$, and $\log p(y)$ evaluate to $-\text{inf}$, then PMI evaluates to NaN.

I asked ChatGPT and it sugggested to use the following construction:

import jax.numpy as jnp

def custom_subtract(a, b, c):
    # Calculate the result for a - (b + c)
    result = a - (b + c)

    # Create a mask for the special case when all inputs are -inf
    mask = jnp.logical_and(jnp.logical_and(a == -jnp.inf, b == -jnp.inf), c == -jnp.inf)

    # Return 0 where the mask is True, and the original result otherwise
    return jnp.where(mask, 0.0, result)

# Test
a = jnp.array(-float('inf'))
b = jnp.array(-float('inf'))
c = jnp.array(-float('inf'))

print(custom_subtract(a, b, c))  # Should print 0.0

Fix importing

so we don't have to manually import like this:
import bmi.samplers.SplitMultinormal

Implement multivariate Student-t sampler

Student-t distribution has its multivariate generalization (which also for $\nu \gg 2$ is similar to the normal).

  1. Sampling can be efficiently implemented as described here.
  2. Mutual information can be calculated analytically, as described in
    R.B. Arellano-Valle et al., Shannon Entropy and Mutual Information for Multivariate Skew-Elliptical Distributions, Scandinavian Journal of Statistics, Vol. 40, No. 1 (March 2013), p. 47.

Moreover, the mentioned article describes MI of several other families of distributions, although sampling from them may be tricky and MI calculation may require numerical integration.

Wrap the estimators to make them easier to run

Currently we have an interface for Python estimators taking the X and Y samples and a ExternalEstimator class which takes the path to the task.
The latter one is very convenient when one loads the tasks from the disk. It'd be good to abstract it into an interface and make the existing Python estimators implement such interface.

  • Modify the estimator interface to provide parameters.
  • Adjust the existing implementations to provide the parameters.
  • Define a new interface ITaskEstimator, for returning the parameters and estimating the MI using the loaded data.
  • Adjust ExternalEstimator, so it implements the ITaskEstimator interface.
  • Create a factory method which takes the estimator and generates wraps it into a ITaskEstimator implementation.

Benchmark versioning

Introduce principled versioning of the benchmark, using GitHub releases.

Additional changes:

  • Version number in Python code or ReadMe?
  • Draft the v1.0 release, when it's done.

Allow for Python 3.11 and 3.12

Python 3.12 was released in October 2023 and we still currently have the following pins in pyproject.toml:

# <3.11 because of PyType. Update when it's resolved
# <3.12 because of SciPy. Update when it's resolved
python = ">=3.9,<3.11"

The idea for this issue is to update the dependencies, so that they work with Python 3.11 and 3.12. It's also likely that we can drop 3.9 entirely.

Tasks:

  • Update pyproject.toml, resolving the dependencies appropriately.
  • Update .github/workflows, so that we test against 3.11 and 3.12.

Tests for the Spiral

In #23 we have an example of the spiraling diffeomorphism. Think whether the API is right and write the tests (currently they are missing).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.