Comments (10)
Yep, I am aware of this issue which is the result of the flagser method not taking into account the flag_matrix format and converting it to a sparse matrix. This conversion removes all non-diagonal zeros.
I have a fix and I will make a PR tomorrow.
from pyflagser.
Btw, I believe you are not setting to 0 the right indices. This is dist_0:
[[0. 0. 1.90211303 1.90211303 1.1755705 0. ]
[0. 0. 0. 1.90211303 1.90211303 1.1755705 ]
[1.90211303 0. 0. 0. 1.90211303 1.90211303]
[1.90211303 1.90211303 0. 0. 0. 1.90211303]
[1.1755705 1.90211303 1.90211303 0. 0. 1.1755705 ]
[0. 1.1755705 1.90211303 1.90211303 1.1755705 0. ]]
Note the 0 on the corners instead of (6,5) and (5,6).
from pyflagser.
@ulupo Indeed, it does make sense. But the following problem remains. If you have a dense unweighted graph, you will store it as a matrix of dtype bool. As it is not sparse, storing it in scipy.sparse
format, will be almost 3 times as costly in memory as storing it as np.ndarray
.But passing it as a dense matrix will be understood by the flagser
method as a two step filtration of a fully connected graph. While passing it as a sparse matrix yields the expected behavior. You can't have np.inf
in a np.ndarray
of bools so you're stuck. I am afraid these edge cases will make the documentation very confusing.
What I would prefer is to do both solution together. What do you think of the following:
flag_matrix : ndarray or scipy.sparse matrix, required
Matrix representation of a directed/undirected weighted/unweighted
graph. Diagonal elements are vertex weights. The way zero values are
handled depending on the format of the matrix. If the matrix is a dense
``np.ndarray``, zero values denote zero-weighted edges and ``np.inf``
values absent edges. If the matrix is a sparse ``scipy.sparse`` matrix,
explicitely stored off-diagonal zeros and all diagonal zeros denote
zero-weighted edges. Off-diagonal values that have not been explicitely
stored are treated by ``scipy.sparse`` as zeros but will be understood
as ``np.inf``-valued edges, i.e., edges absent from the filtration.
inifinte_weights : ndarray or scipy.sparse matrix of bool or ``None``,
optional, default: ``None``
Mask representing the indices of the edges that have infinite weights.
If ``None``, contains only ``False``. It is particularly useful if the
`flag_matrix` is a dense ``np.ndarray`` matrix and its dtype does not
allow ``np.inf`` values.
Also, to answer your question, what is passed to C++ flagser is a list of row indices r, column indices c, and associated weight w denoting the fact that the edge (r_i c_i) has weight w_i.
from pyflagser.
@wreise What I have done fixes the problem but not in all cases. In an adjacency matrix representation, a 0
have a double meaning. It both means no edge or edge right from the start. Most scipy.sparse
data structure are able to deal with that. If a 0 was purposefully inserted, it will be remembered. But then, if the adjacency matrix is too dense, they are very expensive memory-wise. However, numpy
dense arrays will always treat all 0 as explicit. This is what you want in your case, but it removes the possibility of having a double meaning for 0
s.
from pyflagser.
@gtauzin , thank you for your detailed answer and noticing my mistake.
Given the performance issue that you pointed out, maybe it is better to keep the current, sparse interpretation.
An alternative idea would be documenting this behavior - with the stability guarantees, I wouldn't mind.
from pyflagser.
@ulupo @wreise
I have thought of nicer solutions to simplify the problem of having a dual meaning for 0
s
- Allow dense flag_matrix to have
np.inf
weights that means no edge, leaving0
-weighted edges with the meaning of an edge appearing at filtration value0
. - Then a scipy.sparse matrix stores only edges representing connections at the weight of the edge. In particular, all stored
0
-weighted edges are all edges appearing at filtration value0
. Unstored edges then consistently mean that they arenp.inf
-weighted
The only problem is that this only makes sense if flag_matrix.dtype if float, which is however usually the case in real-life problems. Even well-documented, this behavior will be confusing for the user.
Another solution would be to allow the user to pass a mask impicit_zeros to the flagser
method when the flag_matrix is dense.
from pyflagser.
@gtauzin I find np.inf
to be particularly attractive as it is mathematically very meaningful and in line with the behaviour of ripser
. I also like the suggested behaviour in the sparse case: how does the C++ backend currently handle unstored edges when passed sparse matrices?
from pyflagser.
@gtauzin not directly answering the question yet, but I'm thinking that these issues seem to all arise from the fundamental fact that flagser
tries to be smart about deciding whether the input represents a filtered (directed) simplicial complex or a static complex. This is connected to another issue I've had since the beginning, which is that flagser
tries to return Betti numbers even when it returns barcodes. In my mind, this is awkward as returning barcodes suggests the input has been interpreted as a filtered complex, while returning Betti numbers suggests that it was also interpreted as a static one.
Personally, I would like to discuss whether we can separate the behaviour of flagser
more clearly between the two options. One idea would be to add another parameter which indicates how the input should be interpreted (filtration or not) – e.g. one could have persistence=True
or persistence=False
. The return values would also be different, e.g. why return Betti numbers when persistence=True
? One could only return barcodes then.
I am hopeful that this would kill two birds with one stone: clarify the purpose of the flagser
function, and remove ambiguities about the treatment of input. When persistence=True
, one could decide on a standard for sparse matrices where absent entries mean infinite weights and zero weights need to be explicitly set (btw, on this point, I would also argue that diagonal weights should be explicitly set to zero even in the sparse case, but we can discuss this separately).
In my opinion, your example would not be relevant in the case persistence=True
as a two-step filtration where everything is present in the second step is better thought about as not a filtration at all. So one would not want to compute the barcode in such a case, and rather only "static" information. When persistence=False
it would be natural for False
to behave as np.inf
(absent edge) so there is no tension.
The problem arguably remains that for int
dtypes, there is no np.inf
either. Thus, something else should denote the absence of an edge when persistence=False
and the input is an ndarray. The obvious candidate is 0
because this is consistent with False
in the boolean case. This would be OK for sparse matrices too as most of the time people assume absent entries are 0
, but is at odds with the presence of explicitly set zeros in the case when persistence=False
. So I'm not claiming the proposed solution is perfect yet.
from pyflagser.
@ulupo Thanks for the input!
The betti numbers outputed by flagser
corresponds to the number of infinite bars. I think it can be of interest in both cases (filtration or not). If you imagine that the input graph is a distance matrix thresholded by a max_edge_length
that corresponds to a good guess for the homology of the VR complex at that value to be the same than the homology of the underlying manifold the point clouds are being sampled from. Then it outputs meaningful betti numbers.
Before going any further, I would argue that the current solution already on master needs to be made into a release ASAP. The current version, v0.2.1 is almost unusable. If we do not agree on a solution now, I believe we should go forward with #32 and then #31. And then implement a nicer solution.
That being said, let me summarize here the situation because it is getting quite complex for me to follow. We have different cases (dtype
, format
, persistence
, and diagonal
).
If
dtype = float
:
persistence = True
: It is reasonable to think the user wants persistence. If not, the user can easily change its dtype to bool. We can leave the responsability to the user to pass the right boolean matrix as an input to flagser.
If
format = np.ndarray
:
Both0
andnp.inf
are possible and denote respectively 0-weighted edges and no edges.If
format = np.ndarray
:
Explicit0
and implicit0
denote respectively 0-weighted edges and no edges.The behavior on and off
diagonal
is the same.
If
dtype = int
:
persistence = True
: It is also reasonable to think the user wants persistence. If not, the user can easily change its dtype to bool. We can leave the responsability to the user to pass the right boolean matrix as an input to flagser.
If
format = np.ndarray
:
np.inf
is not possible and0
should most likely denote 0-weighted edges.If
format = np.ndarray
:
Explicit0
and implicit0
are both possible and could denote respectively 0-weighted edges and no edges.
If
dtype = bool
:
persistence = False
: It is reasonable to think the user does not want persistence.
If
format = np.ndarray
:
np.inf
is not possible. But aspersistence = True
, it does not matter.False
means no edge offdiagonal
and False-weight on-diagonal
.If
format = np.ndarray
:
ExplicitFalse
and implicitFalse
are both possible but both mean the same.False
means no edge offdiagonal
and False-weight on-diagonal
.
Laying down all possibilities, it seems to me that the value of persistence
can always reasonably be inferred from the matrix dtype
. The actual problem we are facing is only with dtype = int
!
I believe we should forget about the infinite_weight
mask and allow np.inf
when dtype = float
and then forbid the user to pass a matrix with dtype = int
.
from pyflagser.
@gtauzin so if I understand correctly you are arguing that while a persistence
parameter would help remove ambiguities, another way consists of:
- disallowing
int
array input; - establishing the rule that
float
is the only dtype associated to persistence (and cannot be used for "static" computations), whilebool
is always associated to "static" computations (and cannot be used for persistence).
Concerning 2), I am a bit confused by the statements "False
means [...] 0-weight on-diagonal" in the dtype=bool
case. Surely, if bool
types are always associated to no persistence, we should not be talking about weights?
So far, I thought it would be good to support int data types for both filtered and unfiltered situations. In the filtered case, ints are special instances of real numbers so the same logic can apply (modulo issues with np.inf
), and int arrays are more light-weight than their float counterparts (performance). In the unfiltered case, one can see True
as corresponding to 1
and False
to 0
. I agree this second point is weaker. And the first point may be moot also if e.g. C++ flagser always converts weights to floats (is that the case?).
from pyflagser.
Related Issues (20)
- Computations following an interrupted call fail HOT 5
- Support for filtration algos HOT 2
- Computations take a long time HOT 2
- Filtration algorithm vertex_degree produces segmentation fault HOT 2
- Incorrect outputs of the flagser bindings.
- One dimensional output for empty persistence diagram HOT 3
- flagser_weighted does not terminate/takes forever when coeff > 2 HOT 5
- Relax condition on square shape on sparse input HOT 1
- Warnings and crashes for FlagserPersistence with n_jobs>1 HOT 2
- Feature request: Allow (read only) access to the directed flag complex via python HOT 16
- `save_weighted_flag` crashes (and is uncovered by tests) HOT 3
- flagser_memory support HOT 13
- `pyflagser.flagser_count_unweighted` crashes on really huge graphs HOT 24
- Drop support for Python 3.6 HOT 6
- ERROR pyflagser - ModuleNotFoundError: No module named 'pyflagser.modules' HOT 4
- error 'flagser/src/flagser-count.cpp' file not found HOT 1
- feature request: custom filtration HOT 4
- Error installing pyflagser in macOS 12.4 FileNotFoundError
- [BUG] Error installing pyflagser HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pyflagser.