Giter Club home page Giter Club logo

hdstats's Introduction

hdstats

A library of multivariate, high-dimensional statistics, and time series algorithms for spatial-temporal stacks.


Geometric median PCM

Generation of geometric median pixel composite mosaics from a stack of data; see example.

If you are using this algorithm in your research or products, please cite:

Roberts, D., Mueller, N., & McIntyre, A. (2017). High-dimensional pixel composites from earth observation time series. IEEE Transactions on Geoscience and Remote Sensing, 55(11), 6254-6264.

Geometric Median Absolute Deviation (MAD) PCM

Accelerated generation of geometric median absolute deviation pixel composite mosaics from a stack of data; see example.

If you are using this algorithm in your research or products, please cite:

Roberts, D., Dunn, B., & Mueller, N. (2018). Open data cube products using high-dimensional statistics of time series. In IGARSS 2018-2018 IEEE International Geoscience and Remote Sensing Symposium (pp. 8647-8650).

Feature generation for spatial-temporal time series stacks.

see example.


Assumptions

We assume that the data stack dimensions are ordered so that the spatial dimensions are first (y,x), followed by the spectral dimension of size p, finishing with the temporal dimension. Algorithms reduce in the last dimension (typically, the temporal dimension).


Research and Development / Advanced Implementations

All advanced implementations and cutting-edge research codes are now found in github.com/daleroberts/hdstats-private. These are only available to research collaborators.

hdstats's People

Contributors

daleroberts avatar kirill888 avatar omad avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

hdstats's Issues

Install fails for python 3.11: longintrepr.h: No such file or directory

pip install hdstats on python 3.11.x produces the following error. Additional context here.

Collecting hdstats
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: numpy in /srv/conda/envs/notebook/lib/python3.11/site-packages (from hdstats) (1.24.3)
Requirement already satisfied: scipy in /srv/conda/envs/notebook/lib/python3.11/site-packages (from hdstats) (1.10.1)
Building wheels for collected packages: hdstats
  Building wheel for hdstats (pyproject.toml) ... error
  error: subprocess-exited-with-error
  
  × Building wheel for hdstats (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [48 lines of output]
      {'include_dirs': ['/tmp/pip-build-env-zn64f58o/overlay/lib/python3.11/site-packages/numpy/core/include'], 'extra_compile_args': ['-fopenmp'], 'extra_link_args': ['-fopenmp'], 'define_macros': []}
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build/lib.linux-x86_64-cpython-311
      creating build/lib.linux-x86_64-cpython-311/hdstats
      copying hdstats/__init__.py -> build/lib.linux-x86_64-cpython-311/hdstats
      copying hdstats/tsslow.py -> build/lib.linux-x86_64-cpython-311/hdstats
      copying hdstats/utils.py -> build/lib.linux-x86_64-cpython-311/hdstats
      running egg_info
      writing hdstats.egg-info/PKG-INFO
      writing dependency_links to hdstats.egg-info/dependency_links.txt
      writing requirements to hdstats.egg-info/requires.txt
      writing top-level names to hdstats.egg-info/top_level.txt
      reading manifest file 'hdstats.egg-info/SOURCES.txt'
      reading manifest template 'MANIFEST.in'
      warning: no files found matching '*.rst' anywhere in distribution
      warning: no files found matching '*.pxd' anywhere in distribution
      warning: no files found matching '*.h' anywhere in distribution
      no previously-included directories found matching '.eggs'
      no previously-included directories found matching 'tests'
      no previously-included directories found matching 'data'
      no previously-included directories found matching 'docs'
      no previously-included directories found matching '.vscode'
      writing manifest file 'hdstats.egg-info/SOURCES.txt'
      copying hdstats/dtw.c -> build/lib.linux-x86_64-cpython-311/hdstats
      copying hdstats/dtw.pyx -> build/lib.linux-x86_64-cpython-311/hdstats
      copying hdstats/geomad.c -> build/lib.linux-x86_64-cpython-311/hdstats
      copying hdstats/geomad.pyx -> build/lib.linux-x86_64-cpython-311/hdstats
      copying hdstats/geomedian.c -> build/lib.linux-x86_64-cpython-311/hdstats
      copying hdstats/geomedian.pyx -> build/lib.linux-x86_64-cpython-311/hdstats
      copying hdstats/ts.c -> build/lib.linux-x86_64-cpython-311/hdstats
      copying hdstats/ts.pyx -> build/lib.linux-x86_64-cpython-311/hdstats
      running build_ext
      skipping 'hdstats/geomedian.c' Cython extension (up-to-date)
      skipping 'hdstats/geomad.c' Cython extension (up-to-date)
      skipping 'hdstats/ts.c' Cython extension (up-to-date)
      skipping 'hdstats/dtw.c' Cython extension (up-to-date)
      building 'hdstats.geomedian' extension
      creating build/temp.linux-x86_64-cpython-311
      creating build/temp.linux-x86_64-cpython-311/hdstats
      /srv/conda/envs/notebook/bin/x86_64-conda-linux-gnu-cc -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /srv/conda/envs/notebook/include -fPIC -O2 -isystem /srv/conda/envs/notebook/include -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /srv/conda/envs/notebook/include -DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem /srv/conda/envs/notebook/include -fPIC -I/tmp/pip-build-env-zn64f58o/overlay/lib/python3.11/site-packages/numpy/core/include -I/srv/conda/envs/notebook/include/python3.11 -c hdstats/geomedian.c -o build/temp.linux-x86_64-cpython-311/hdstats/geomedian.o -fopenmp
      hdstats/geomedian.c:196:12: fatal error: longintrepr.h: No such file or directory
        196 |   #include "longintrepr.h"
            |            ^~~~~~~~~~~~~~~
      compilation terminated.
      error: command '/srv/conda/envs/notebook/bin/x86_64-conda-linux-gnu-cc' failed with exit code 1
      [end of output]

HDStats is private, can we make it public

Hey Dale

Can we please make this repo public and perhaps hosted in the ODC organisation?

It's apache licensed, but it's not open if it is not discoverable.

Is there any good reason for it to remain private?

Excessive memory use when computing various MADs on nan data

This is due to use of __bad_mask function

return np.isnan(X.sum(axis=2)).all(axis=2)

It is creating very large intermediate arrays

  • NX*NY*NT*(sizeof(float32) + sizeof(bool))

When operating on large inputs (close to total RAM capacity) this becomes a major problem, it is also slow to allocate that much extra ram. This computation can be performed without any extra RAM. Also this computation is serial since it's using numpy with default settings.

I feel like it's not even needed when computing MADs, we should instead modify MADs to detect "all NaNs" on input case and output NaN in that case and avoid double pass (of which first is serial) over the data.

Also we should consider providing a combined function that outputs all possible MADs statistics in one pass over the data, this would also allow us to re-use shared computation.

Furthermore MAD computation produces another temporary array of size NX*NY*NT*(sizeof(float32)), which can also be avoided, by running np.{nan_}median straight after computing weights.

For every X,Y pixel column (in parallel):
    Compute Weights
    Squeeze out NaNs
    Run Median or report NaN if no valid data found

Above would only use NT*(sizeof(float32))*Nparallellism worth of temporary RAM.

unit16 version of bcmad and smad ignore nodata parameter

PR is coming,

BCMAD uint16:

hdstats/hdstats/pcm.pyx

Lines 947 to 955 in 7f3a18a

numer = 0.
denom = 0.
for j in range(p):
scaled = X[row, col, j, t] * scale + offset
numer = numer + fabs(scaled - gm[row, col, j])
denom = denom + fabs(scaled + gm[row, col, j])
result[row, col, t] = numer / denom

SMAD uint16:

hdstats/hdstats/pcm.pyx

Lines 890 to 899 in 7f3a18a

numer = 0.
norma = 0.
for j in range(p):
scaled = X[row, col, j, t] * scale + offset
value = scaled * gm[row, col, j]
numer = numer + value
norma = norma + scaled*scaled
result[row, col, t] = 1. - numer/(sqrt(norma)*normb_sqrt)

EMAD version does check if X is nodata

hdstats/hdstats/pcm.pyx

Lines 827 to 833 in 7f3a18a

total = 0.
for j in range(p):
int_value = X[row, col, j, t]
if int_value != nodata:
value = int_value * scale + offset - gm[row, col, j]
total = total + value*value
result[row, col, t] = sqrt(total)

But the other two should too.

Also scale and offset is confusing, and either should not be there or should default to scale=1, offset=0, int16 version of geomedian scales output back to original scale, so if you just pass that in results are all wrong.

Missing test fixture file

test_pcm.py expects a file at tests/landchar-small.pkl but no such file is present in the repo.

If it's small enough checking it in should be fine, but code to generate one with some notes on how to run it would be nice too.

emad gives incorrect result

In __emad() in pcm.pyx, invalid pixels have euclidean distance of 0 because nan is not added. This result in a timeseries of euclidean distance of 0 and positive numbers. The nanmedian() calculation is not rejecting the 0 values and therefore gives incorrect result of emad. For cloudy areas where more than half of the time a pixel is masked out, emad for the period will be 0.
smad and bcmad appear to be working properly.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.