Giter Club home page Giter Club logo

hdmedians's Introduction

Hdmedians

Did you know there is no unique way to mathematically extend the concept of a median to higher dimensions?

Various definitions for a high-dimensional median exist and this Python package provides a number of fast implementations of these definitions. Medians are extremely useful due to their high breakdown point (up to 50% contamination) and have a number of nice applications in machine learning, computer vision, and high-dimensional statistics.

This package currently has implementations of medoid and geometric median with support for missing data using NaN.

Installation

The latest version of the package is always available on pypi, so can be easily installed by typing:

pip3 install hdmedians

Medoid

Given a finite set of -dimensional observation vectors , the medoid of these observations is given by

The current implementation of medoid is in vectorized Python and can handle any data type supported by ndarray. If you would like the algorithm to take care of missing values encoded as nan then you can use the nanmedoid function.

Examples

Create an 6 x 10 array of random integer observations.

>>> import numpy as np
>>> X = np.random.randint(100, size=(6, 10))
array([[12,  9, 61, 76,  2, 17, 12, 11, 26,  0],
       [65, 72,  7, 64, 21, 92, 51, 48,  9, 65],
       [39,  7, 50, 56, 29, 79, 47, 45, 10, 52],
       [70, 12, 23, 97, 86, 14, 42, 90, 15, 16],
       [13,  7,  2, 47, 80, 53, 23, 59,  7, 15],
       [83,  2, 40, 12, 22, 75, 69, 61, 28, 53]])

Find the medoid, taking the last axis as the number of observations.

>>> import hdmedians as hd
>>> hd.medoid(X)
array([12, 51, 47, 42, 23, 69])

Take the first axis as the number of observations.

>>> hd.medoid(X, axis=0)
array([39,  7, 50, 56, 29, 79, 47, 45, 10, 52])

Since the medoid is one of the observations, the medoid function has the ability to only return the index if required.

>>> hd.medoid(X, indexonly=True)
6
>>> X[:,6]
array([12, 51, 47, 42, 23, 69])

Geometric Median

The geometric median is also known as the 1-median, spatial median, Euclidean minisum, or Torricelli point. Given a finite set of -dimensional observation vectors , the geometric median of these observations is given by

Note there is a subtle difference between the definition of the geometric median and the medoid: the search space for the solution differs and has the effect that the medoid returns one of the true observations whereas the geometric median can be described as a synthetic (not physically observed) observation.

The current implementation of geomedian uses Cython and can handle float64 or float32. If you would like the algorithm to take care of missing values encoded as nan then you can use the nangeomedian function.

Examples

Create an 6 x 10 array of random float64 observations.

>>> import numpy as np
>>> np.set_printoptions(precision=4, linewidth=200)
>>> X = np.random.normal(1, size=(6, 10))
array([[ 1.1079,  0.5763,  0.3072,  1.2205,  0.8596, -1.5082,  2.5955,  2.8251,  1.5908,  0.4575],
       [ 1.555 ,  1.7903,  1.213 ,  1.1285,  0.0461, -0.4929, -0.1158,  0.5879,  1.5807,  0.5828],
       [ 2.1583,  3.4429,  0.4166,  1.0192,  0.8308, -0.1468,  2.6329,  2.2239,  0.2168,  0.8783],
       [ 0.7382,  1.9453,  0.567 ,  0.6797,  1.1654, -0.1556,  0.9934,  0.1857,  1.369 ,  2.1855],
       [ 0.1727,  0.0835,  0.5416,  1.4416,  1.6921,  1.6636,  1.6421,  1.0687,  0.6075, -0.0301],
       [ 2.6654,  1.6741,  1.1568,  1.3092,  1.6944,  0.2574,  2.8604,  1.6102,  0.4301, -0.3876]])
>>> X.dtype
dtype('float64')

Find the geometric median, taking the last axis as the number of observations.

>>> import hdmedians as hd
>>> np.array(hd.geomedian(X))
array([ 1.0733,  0.8974,  1.1935,  0.9122,  0.9975,  1.3422])

Take the first axis as the number of observations.

>>> np.array(hd.geomedian(X, axis=0))
array([ 1.4581,  1.6377,  0.7147,  1.1257,  1.0493, -0.091 ,  1.7907,  1.4168,  0.9587,  0.6195])

Convert to float32 and compute the geometric median.

>>> X = X.astype(np.float32)
>>> m = hd.geomedian(X)

References

hdmedians's People

Contributors

daleroberts avatar danodonovan avatar lrvdijk avatar stevenjkern avatar eldeveloper avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.