scikit-tda / persim Goto Github PK

Distances and representations of persistence diagrams

Home Page: https://persim.scikit-tda.org

License: MIT License

Python 100.00%

persistence-images topology homology tda bottleneck-distance persistence-image python sliced-wasserstein-distance persistence-diagrams

persim's Introduction

Scikit-TDA is a home for Topological Data Analysis Python libraries intended for non-topologists.

This project aims to provide a curated library of TDA Python tools that are widely usable and easily approachable. It is structured so that each package can stand alone or be used as part of the scikit-tda bundle.

Documentation

For complete documentation please checkout docs.scikit-tda.org.

Contact

If you would like to contribute, please reach out to us on github by starting a discussion topic, creating an issue, or reaching out on twitter.

Setup

To install all these libraries

    pip install scikit-tda

Citations

If you would like to cite Scikit-TDA, please use the following citation/bibtex

Saul, Nathaniel and Tralie, Chris. (2019). Scikit-TDA: Topological Data Analysis for Python. Zenodo. http://doi.org/10.5281/zenodo.2533369

@misc{scikittda2019,
  author       = {Nathaniel Saul, Chris Tralie},
  title        = {Scikit-TDA: Topological Data Analysis for Python},
  year         = 2019,
  doi          = {10.5281/zenodo.2533369},
  url          = {https://doi.org/10.5281/zenodo.2533369}
}

License

This package is licensed with the MIT license.

Contributing

Contributions are more than welcome! There are lots of opportunities for potential projects, so please get in touch if you would like to help out. Everything from code to notebooks to examples and documentation are all equally valuable so please don't feel you can't contribute. To contribute please fork the project make your changes and submit a pull request. We will do our best to work through any issues with you and get your code merged into the main branch.

persim's People

Contributors

Stargazers

Watchers

persim's Issues

persistent landscape

anybody...!!!
Why persistent landscape module is not working in persim?

Bottleneck distance between identical diagrams is non-zero.

bottleneck function repeatedly outputs non-zero distance between identical persistence diagram inputs. For example:

data = np.random.random((5,2))
pd = rips.fit_transform(data)
d = bottleneck(pd[0], pd[0])
d
0.05305507779121399

Note, at first pass it seems like d gets smaller as the number of points in data increases?

Test failure with Python 3.10

In the Nixkgs collection we see test failures on the build system with Python 3.10 and Python 3.11.

Executing pytestCheckPhase
============================= test session starts ==============================
platform linux -- Python 3.10.9, pytest-7.2.0, pluggy-1.0.0
rootdir: /build/persim-0.3.1
collected 104 items / 12 deselected / 92 selected                              

test/test_distances.py ..............................                    [ 32%]
test/test_landscapes.py ...........................F                     [ 63%]
[...]

=================================== FAILURES ===================================
____________________ TestTransformer.test_persistenceimager ____________________

self = <test_landscapes.TestTransformer object at 0x7fff83f6ec50>

    def test_persistenceimager(self):
        pl = PersistenceLandscaper(hom_deg=0, num_steps=5, flatten=True)
        assert pl.hom_deg == 0
        assert not pl.start
        assert not pl.stop
        assert pl.num_steps == 5
        assert pl.flatten
        dgms = [np.array([[0, 3], [1, 4]]), np.array([[1, 4]])]
        pl.fit(dgms)
        assert pl.start == 0
        assert pl.stop == 4.0
        np.testing.assert_array_equal(
            pl.transform(dgms),
            np.array([0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0,]),
        )
        pl2 = PersistenceLandscaper(hom_deg=1, num_steps=4)
        assert pl2.hom_deg == 1
        pl2.fit(dgms)
        assert pl2.start == 1.0
        assert pl2.stop == 4.0
        np.testing.assert_array_equal(pl2.transform(dgms), [[0.0, 1.0, 1.0, 0.0]])
        pl3 = PersistenceLandscaper(hom_deg=0, num_steps=5, flatten=True)
        np.testing.assert_array_equal(
>           pl3.fit_transform(dgms),
            np.array([0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0,]),
        )

test/test_landscapes.py:532: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/nix/store/avaqkdjpx2w50dgyq9769fa3ghvr8cvx-python3.10-scikit-learn-1.2.1/lib/python3.10/site-packages/sklearn/utils/_set_output.py:142: in wrapped
    data_to_wrap = f(self, X, *args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = PersistenceLandscaper(hom_deg=0, start=0, stop=4, num_steps=5)
dgms = [array([[0, 3],
       [1, 4]]), array([[1, 4]])]

    def fit_transform(self, dgms):
        self.fit(dgms=dgms)
>       vals = self.transform(dgms=dgms)
E       TypeError: PersistenceLandscaper.transform() missing 1 required positional argument: 'X'

persim/landscapes/transformer.py:132: TypeError
=============================== warnings summary ===============================
../../nix/store/sgaywpksqi4iqq1b148hagzcbmwpwzbm-python3.10-joblib-1.2.0/lib/python3.10/site-packages/joblib/backports.py:22
  /nix/store/sgaywpksqi4iqq1b148hagzcbmwpwzbm-python3.10-joblib-1.2.0/lib/python3.10/site-packages/joblib/backports.py:22:
[...]
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED test/test_landscapes.py::TestTransformer::test_persistenceimager - TypeError: PersistenceLandscaper.transform() missing 1 required positional ...
=========== 1 failed, 91 passed, 12 deselected, 11 warnings in 3.24s ===========

plot diagram only in blue color (no orange)

First of all, since I was not able to install via pip install persim, I got the source scikit-tda-master.zip and run the setup.py. Also, I'm using Python 2.7 with matplotlib version 1.5.1.

The minimal code below works except by the fact that the diagrams in both dimensions are in blue, not blue/orange, like in documentation.

MWE

import numpy as np
import matplotlib.pyplot as plt 
from ripser import Rips
from persim import plot_diagrams
import matplotlib
print('my matplotlib is', matplotlib.__version__) # my matplotlib is 1.5.1

pts = np.array([ [0, 0], [.8, 0], [.6, 1], [0, 1]  ])
rips = Rips(maxdim=1, coeff=2, thresh=2)
diagramas = rips.fit_transform(pts)
print(diagramas[0])
print(diagramas[1])

plot_diagrams(diagramas, show=True)

See the screenshot.

Implement barcode diagram

A barcode is a graphical representation as a collection of horizontal line
segments in a plane whose horizontal axis corresponds to the parameter and whose
vertical axis represents an (arbitrary) ordering of homology generators. [1]

I'd be happy to contribute my implementation.

[1] Barcodes: The Persistent Topology of Data

Add tag for v0.1.2

Could you add a GitHub release for v0.1.2? This will make it easier to keep the aur repository up to date

Bottleneck distance between repeated intervals = 0

The code below should produce a bottleneck distance of 0.5, but it outputs 0.
Any number of repeated intervals (but different between G and H) has the same issue.

import numpy as np
import persim
G= np.array([[ 0, 1], [0,1]])
H= np.array([[ 0, 1]])
print(persim.bottleneck(G, H))

Sliced Wasserstein distance vs Wasserstein distance

I think your job is very meaningful and exciting！But I am a little confused about the description of Sliced Wasserstein distance.
"The general idea is to compute an approximation of the Wasserstein distance by computing the distance in 1-dimension repeatedly and use the results as a measure. "
Does this mean that Sliced Wasserstein distance is faster than Wasserstein in terms of time complexity?
If not, what is the difference between these two distances in this library?
Thank you for your attention！
Looking forward to your reply!

numpy incompatibility

Sorry! Just ignore my message below. I notice that the issue has already been fixed.

Hello,
It seems that the current version of numpy does not have the datatype option for numpy.copy(),
which affects the following line:

persim/persim/images.py

Line 94 in a904795

dgs = [np.copy(diagram, np.float64) for diagram in diagrams]

Can you modify the line to something like the following?
dgs = [np.copy(diagram.astype(np.float64)) for diagram in diagrams]

Bug Report: `persim.bottleneck` returns 0 for distinct diagrams

Description:

The persim.bottleneck function returns an incorrect result for some input persistence diagrams. When using dgm1 = np.array([[0, 10]]) and dgm2 = np.array([[5, 5]]), the function incorrectly evaluates and returns 0, which is inconsistent with the expected behavior.

Environment:
Python version: 3.9.13
Persim version: 0.3.1

Proper convolution

Right now, the value at each pixel is computed by taking the distance from center of the pixel. We need to instead integrate over the entire pixel.

prism Persistent Image issue

I have an issue producing Persistent Images using Prism in Python.
I'm generating Persistent Diagrams using Dionysus software in python. My problem is I can not generate persistent images from dionysus out. I have converted the output into a NumPy array but still give me a 'TypeError'. Below is my code and the error message:

import dionysus as d
import numpy as np
from sklearn.preprocessing import normalize
import matplotlib.pyplot as plt
from persim import PersImage
Image = plt.imread("8.png")
normed_matrix = normalize(Image, axis=1, norm='l1')
f_lower_star = d.fill_freudenthal(normed_matrix)
p = d.homology_persistence(f_lower_star)
dgms = d.init_diagrams(p, f_lower_star)
#### I do not have an issue until that point , it is the Persistent Image step that gives me error !
pim = PersImage(pixels=[20,20],spread=1,verbose=False )
img = pim.transform(dgms[1])

And the error I get is the following:

TypeError Traceback (most recent call last)
in
----> 1 img = pim.transform(dgms)
~/anaconda3/lib/python3.7/site-packages/persim/images.py in transform(self, diagrams)
92 diagrams = [diagrams]
93
---> 94 dgs = [np.copy(diagram, np.float64) for diagram in diagrams]
95 landscapes = [PersImage.to_landscape(dg) for dg in dgs]
96
~/anaconda3/lib/python3.7/site-packages/persim/images.py in (.0)
92 diagrams = [diagrams]
93
---> 94 dgs = [np.copy(diagram, np.float64) for diagram in diagrams]
95 landscapes = [PersImage.to_landscape(dg) for dg in dgs]
96
<array_function internals> in copy(*args, **kwargs)
~/anaconda3/lib/python3.7/site-packages/numpy/lib/function_base.py in copy(a, order, subok)
790
791 """
--> 792 return array(a, order=order, subok=subok, copy=True)
793
794 # Basic operations

TypeError: order must be str, not type

I have later converted the dionysus output into a python list, and a numpy array using the following:

l = [(pt.birth, pt.death) for pt in dgms[1]]
f=np.asarray(l)

but I still get the same error !!!!

Any help will be appreciated.
Many Thanks
Aras

Plotting landscapes

I have been plotting a bunch of persistence landscapes for a project I'm working on and I found several issues:

Legends extend off the bottom of the plot.
Passing tick labels has no effect.
The depth axis does not line up correctly.
I found it convenient to plot only a specified range of depths.
Some of the docstrings are incorrect.
There is horizontal padding added to the plots as an improptu margin setting, but it screws up the actual values in the plots.

I am planning on submitting a PR to fix these issues. If there are any other issues with plotting landscapes, or any issues with the list I have above, please comment and let me know. Thanks!

Persistence Landscape

During the calculation of landscapes, how is the number of layers/lambdas decided?

Error Code when using transform() for persim.Persimage

ripser behaves normally, but this happens even when I am copying and paste the sample codes in the User Guide. Here's the error code for the user guide codes:

Traceback (most recent call last):

  File "F:\HolyCross\Works\2020SUMMER\TDA\D0\untitled0.py", line 43, in <module>
    img = pim.transform(dgms[1])

  File "C:\Users\ThinkPad\anaconda3\lib\site-packages\persim\images.py", line 94, in transform
    dgs = [np.copy(diagram, np.float64) for diagram in diagrams]

  File "C:\Users\ThinkPad\anaconda3\lib\site-packages\persim\images.py", line 94, in <listcomp>
    dgs = [np.copy(diagram, np.float64) for diagram in diagrams]

  File "<__array_function__ internals>", line 6, in copy

  File "C:\Users\ThinkPad\anaconda3\lib\site-packages\numpy\lib\function_base.py", line 775, in copy
    return array(a, order=order, copy=True)

ValueError: Non-string object detected for the array ordering. Please pass in 'C', 'F', 'A', or 'K' instead

I am new in Python, may I know how I can solve this problem?

Warning for essential cycles

If one passes a persistence diagram with infinite death times to any of these functions, they will return NaN. We should warn the user about this, and/or come up with a default option for dealing with infinite death times (e.g. making the infinite death times equal to the diameter of the point set or maximum non-infinite death time, or some specified value)

Gaussian mean parameters in PersistenceImager

What is the value of mean parameter used if we use a gaussian kernel ? The gaussian kernel corresponding to a point of the diagram is centered in that point of the diagram?

`=persim-0.3.0` 13 tests fail (`AttributeError` and `ValueError`)

Full build log seen here, this is a superset of the failures reported for =perism-0.2.0.

Bugs in bottleneck distance computation

I'm having an issue with the bottleneck distance function, haven't dug into the code to figure out what's up but hoping that someone can hunt it down.

First issue: passing an empty diagram (in the form of an empty array) just throws an error.

Second issue: I seem to have created multiple instances of diagrams whose distance to themselves are non-zero:

dgmA = np.array([[0.11371516, 4.45734882]])
bottleneck(dgmA,dgmA)

returns 2.17181683. First guess is that this is caused by diagrams with a single point but can't be sure.

Persistent entropy implementation

Hi scikit-tda developers!

I am a PhD student from the Combinatorial Image Analysis research group (University of Seville) and I was wondering if we could add persistent entropy, which is a topological data analysis tool with information theory inspiration, to the Scikit-tda library. It has been used in several papers such as [1], [2] or [3], among others.

[1] Atienza, N., González-Díaz, R., & Rucco, M. (2017). Persistent entropy for separating topological features from noise in vietoris-rips complexes. Journal of Intelligent Information Systems, 1-19.
[2] Atienza, N., González-Díaz, R., & Soriano-Trigueros, M. (2018). A new entropy based summary function for topological data analysis. Electronic Notes in Discrete Mathematics, 68: 113-118.
[3] Rucco, M., González-Díaz, R., Jimenez, M.J., Atienza, N., Cristalli, C., Concettoni, E., Ferrante,A., & Merelli, A. (2017)A new topological entropy-based approach for measuring similarities among piecewise linear functions. Signal Processing, 130-138.

Greetings

`=perism-0.1.3` multiple tests fail with `AttributeError: module 'collections' has no attribute 'Iterable'`

This is the full build log, any idea what's going on?

Persistence image for H0?

I want to generate a persistent image for H0，but the value in persimage always is nan.This makes me confuse.Can you give an example?
thanks！

Computing wasserstein distance between persistence diagrams

Hello,
I want to use your library but I got stuck. I would apperciate your help with my issue very much.
I'm trying to compute Wasserstein distance between two persistence diagrams generated through Python Ripser library. I found two interesting functions in Persim: sliced_wasserstein and wasserstein_matching.

My diagram generation looks like that:

data = json.loads(data)
data = pd.DataFrame.from_dict(data)
rips = Rips()
dgms = rips.fit_transform(data)
for i in dgms:
    print(type(i))
    i.tofile(directory+"diagram.txt")
plot_diagrams(dgms, show=False)
plt.savefig("persistence_diagram.png")
plt.close()

'dgms' is a list which contains numpy arrays, so I'm getting them out in my 'for' line.

My Wasserstein function usage looks like that:

with open(loc) as f:
    img1 = np.fromfile(f)
    f.close()
with open(loc2) as f:
    img2 = np.fromfile(f)
    f.close()
persim.sliced_wasserstein(img1, img2)

I tried to pass to wasserstein_matching three kinds of data (diagram in .png, dgms list and np.array) but all I constantly get is an error 'IndexError: too many indices for array'. So I switched into sliced_wasserstein where I get such error:
Traceback (most recent call last): File "C:/Users/Patka/PycharmProjects/MGR/Mapper.py", line 26, in <module> persim.sliced_wasserstein(img1, img2) File "C:\Users\Patka\environmentpython\lib\site-packages\persim\sliced_wasserstein.py", line 53, in sliced_wasserstein sw += step * cityblock(sorted(V1), sorted(V2)) ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() One weird thing for me is that when I print i.shape before saving to file, I get two dimensions, e.g. (12,2) but when I read from the same file using numpy.fromfile() I get a tuple (12,).

Does anybody have a cure for that? My final aim is to compute distances for lots of diagrams and to cluster them but I'm stuck on comparing two ones...

PersistenceImager error

I'm getting the following traceback, I think there was a change in the standard library https://docs.python.org/3/library/collections.abc.html

necessitating a small change to the source code on line 618

I am using py 3.10

Thanks

Persistence Landscapes and Machine Learning notebook issue

Hi,

I would like to use this notebook and it used to be visible but there seem to be an issue with it now.

Is it possible to fix it?

Thanks,
Renaud

Persistence Image for H0, H1, and H2 plus dropping inf values

Hello,

I am running

pimgr = PersistenceImager(pixel_size=size,kernel_params={'sigma': [[1.0, 0.0], [0.0, 1.0]]})
pimgr.fit(diagrams_h1)

I was wondering if it possible to 1) drop features that extend to inf using flags in PersistenceImager or in Rips to avoid the "OverflowError: cannot convert float infinity to integer" error when calling pimgr.fit().

and 2) is it possible to compute the PersistenceImage using more than one level of homology(i.e. H0, H1, H2 rather than just H1 from the documentation example).

Thanks!

Implement summed area table integration trick

See https://en.wikipedia.org/wiki/Summed-area_table

Persistence weight function and negative values in persistence image

The 'persistence' weight function in PersistenceImager is the one defined here https://jmlr.org/papers/volume18/16-337/16-337.pdf (page 9) that is linear on persistence coordinate and goes from 0 if persistence coordinate is 0 to 1 if persistence coordinate is the maximum?

I have obtained negative values in a persistent image (they were very close to 0). How is that possible? I have used Gaussian kernel and 'persistence' weight function, so the values in a persistence image are sums of the product of nonnegative functions.

Default weighting function normalization

The default linear weighting function is being normalized per diagram rather than being fixed or normalized per experiment. As a consequence, the norm of a persistence image corresponding to a diagram with one point very close to the diagonal is not close to zero.

persim/persim/images.py

Lines 151 to 160 in 76b2b5a

 if landscape is not None: 

 if len(landscape) > 0: 

 maxy = np.max(landscape[:, 1]) 

 else: 

 maxy = 1 

 def linear(interval): 

 # linear function of y such that f(0) = 0 and f(max(y)) = 1 

 d = interval[1] 

 return (1 / maxy) * d if landscape is not None else d

How to calculate the distance between two landscapes?

I averaged different categories of persistent landscapes. How can I calculate the distance between a certain landscape with this average landscape? I want to use it to classify.

Is persim.plot still available?

I'm using Python 3.8 and I have these:

$ pip3.8 list -u
Package         Version
--------------- -------
cycler          0.10.0 
Cython          0.29.14
hopcroftkarp    1.2.5  
Jinja2          2.11.1 
joblib          0.14.1 
kiwisolver      1.1.0  
kmapper         1.2.0  
llvmlite        0.31.0 
MarkupSafe      1.1.1  
matplotlib      3.1.3  
numba           0.48.0 
numpy           1.18.1 
persim          0.1.2  
Pillow          7.0.0  
pip             20.0.2 
pyparsing       2.4.6  
python-dateutil 2.8.1  
ripser          0.4.1  
scikit-learn    0.22.1 
scikit-tda      0.0.3  
scipy           1.4.1  
six             1.14.0 
tadasets        0.0.4  
umap-learn      0.3.10

The code below from http://persim.scikit-tda.org/notebooks/distances.html

import numpy as np
import persim
import persim.plot
import tadasets
import ripser
import matplotlib.pyplot as plt

gives the error

ModuleNotFoundError: No module named 'persim.plot'

Removing the import persim.plot the error

AttributeError: module 'ripser' has no attribute 'plot_dgms'

appears when calling ripser.plot_dgms().

Any idea how to solve it? Maybe documentation is outdated?

Code

import numpy as np
import persim
#import persim.plot
import tadasets
import ripser
import matplotlib.pyplot as plt

data_clean = tadasets.dsphere(d=1, n=100, noise=0.0)
data_noisy = tadasets.dsphere(d=1, n=100, noise=0.1)

plt.scatter(data_clean[:,0], data_clean[:,1], label="clean data")
plt.scatter(data_noisy[:,0], data_noisy[:,1], label="noisy data")
plt.axis('equal')
plt.legend()
plt.show()

dgm_clean = ripser.ripser(data_clean)['dgms'][1]
dgm_noisy = ripser.ripser(data_noisy)['dgms'][1]

ripser.plot_dgms([dgm_clean, dgm_noisy] , labels=['Clean $H_1$', 'Noisy $H_1$'])

Flexible weighting and kernel functions

Right now, Gaussian kernel and linear weighting function are hard coded. The functionality for this is half built, but needs to be threaded through all the way.

`=perism-0.2.0` multiple tests fail

This is the full build log and the failing tests appear to be a superset of the test failures reported for =perism-0.1.3.

Sorry, but I really to know what function this line of code do

Sorry, but I really want to understand what this code do
curr_img[1:, 1:] - curr_img[:-1, 1:] - curr_img[1:, :-1] + curr_img[:-1, :-1]
in persim/images.py _transform
pers_img += wts[i]*(curr_img[1:, 1:] - curr_img[:-1, 1:] - curr_img[1:, :-1] + curr_img[:-1, :-1])

Diagram with integer numbers returns 'nan' values.

Example:

This does not work: dgm = [[0, 2], [0, 6], [0, 8]];

This one works fine: dgm = [[0.0, 2.0], [0.0, 6.0], [0.0, 8.0]];

Thanks for solving the issue!

bring CI pipeline up to Python 3.12

The existing CI test pipeline stops at Python 3.8, and versions prior to 3.6 are not even included in Github's docker images anymore.

This issue is to bring the CI pipeline up to date to 3.12, which will facilitate the work in this milestone.

This issue blocks the other issues in that milestone.

	if landscape is not None:
	if len(landscape) > 0:
	maxy = np.max(landscape[:, 1])
	else:
	maxy = 1

	def linear(interval):
	# linear function of y such that f(0) = 0 and f(max(y)) = 1
	d = interval[1]
	return (1 / maxy) * d if landscape is not None else d