Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential problem in branch-type classification of cycle #166

Open
kushaangupta opened this issue Mar 22, 2022 · 6 comments
Open

Potential problem in branch-type classification of cycle #166

kushaangupta opened this issue Mar 22, 2022 · 6 comments

Comments

@kushaangupta
Copy link
Contributor

kushaangupta commented Mar 22, 2022

The following minimal skeleton is not being classified as cycle in summary. I was wondering if it's a problem or a compromise of the current algorithm for branch-type classification.

import numpy as np
import matplotlib.pyplot as plt
from skan import csr, draw
skel = csr.Skeleton(np.array([
    [0, 1, 0],
    [1, 0, 1],
    [0, 1, 1]
]))
fig, ax = plt.subplots(figsize=(4,4))
draw.overlay_skeleton_networkx(skel.graph, skel.coordinates, axis=ax)
csr.summarize(skel)

90d86b01-25f0-47e9-993f-3e662324e249

skeleton-id node-id-src node-id-dst branch-distance branch-type mean-pixel-value stdev-pixel-value image-coord-src-0 image-coord-src-1 image-coord-dst-0 image-coord-dst-1 coord-src-0 coord-src-1 coord-dst-0 coord-dst-1 euclidean-distance
0 2 3 4.242641 2 1.0 0.0 1 2 2 1 1 2 2 1 1.414214
0 2 3 1.414214 2 1.0 0.0 1 2 2 1 1 2 2 1 1.414214
0 2 3 2.000000 2 1.0 0.0 1 2 2 1 1 2 2 1 1.414214

At present only paths with src == dst are classified as cycles, but what about the case where two paths have same src, dst nodes?

@jni
Copy link
Owner

jni commented Mar 22, 2022

@kushaangupta can you please include complete imports, screenshots, and outputs when making issues?

At any rate, wow that is not what I expected to see:

   skeleton-id  node-id-src  node-id-dst  branch-distance  ...  coord-src-1  coord-dst-0  coord-dst-1  euclidean-distance
0            0            2            3         4.242641  ...            2            2            1            1.414214
1            0            2            3         1.414214  ...            2            2            1            1.414214
2            0            2            3         2.000000  ...            2            2            1            1.414214

What I expected to happen is that the "junction" at the bottom right would be cleared by the MST method, and then there would be a single loop, going from 4 to 4. So this indeed appears to be a major bug. I don't think it's at the classification step, but at the graph simplification and traversal steps.

@kushaangupta
Copy link
Contributor Author

can you please include complete imports, screenshots, and outputs when making issues?

Done

I don't think it's at the classification step, but at the graph simplification and traversal steps.

I understand. So, is the problem in scipy.sparse.csgraph.minimum_spanning_tree?

@jni
Copy link
Owner

jni commented Mar 23, 2022

So, is the problem in scipy.sparse.csgraph.minimum_spanning_tree

More likely, in how we use it, here:

skan/src/skan/csr.py

Lines 774 to 814 in f606888

def _mst_junctions(csmat):
"""Replace clustered pixels with degree > 2 by their minimum spanning tree.
This function performs the operation in place.
Parameters
----------
csmat : NBGraph
The input graph.
pixel_indices : array of int
The raveled index in the image of every pixel represented in csmat.
spacing : float, or array-like of float, shape `len(shape)`, optional
The spacing between pixels in the source image along each dimension.
Returns
-------
final_graph : NBGraph
The output csmat.
"""
# make copy
# mask out all degree < 3 entries
# find MST
# replace edges not in MST with zeros
# use .eliminate_zeros() to get a new matrix
csc_graph = csmat.tocsc()
degrees = np.asarray(csmat.astype(bool).astype(int).sum(axis=0))
non_junction = np.flatnonzero(degrees < 3)
non_junction_column_start = csc_graph.indptr[non_junction]
non_junction_column_end = csc_graph.indptr[non_junction + 1]
for start, end in zip(non_junction_column_start, non_junction_column_end):
csc_graph.data[start:end] = 0
csr_graph = csc_graph.tocsr()
non_junction_row_start = csr_graph.indptr[non_junction]
non_junction_row_end = csr_graph.indptr[non_junction + 1]
for start, end in zip(non_junction_row_start, non_junction_row_end):
csr_graph.data[start:end] = 0
csr_graph.eliminate_zeros()
mst = csgraph.minimum_spanning_tree(csr_graph)
non_tree_edges = csr_graph - (mst + mst.T)
final_graph = csmat - non_tree_edges
return final_graph

@jni
Copy link
Owner

jni commented Mar 23, 2022

This function performs the operation in place.

☝️ that's super-suspicious 😂 I wonder if that factor, combined with the new traversal algorithm (#152) have created an error. If I remember correctly, the algorithm sets the entries to 0. But it might actually need to delete the entries (relatively expensive, as it requires copying the whole csr graph), or we need to change the algorithm to ignore 0 entries.

@jni
Copy link
Owner

jni commented Mar 23, 2022

Actually, I think copying the graph is better than having an additional if expression for every edge.

@jni
Copy link
Owner

jni commented Mar 29, 2022

@kushaangupta and I just had a meeting where we tried to hash this out, and we realised that this is not really a bug at all, it's only a bug in our expectations of what should happen in such a graph. In fact, the node 4 is not a junction node, so skan's current interpretation is correct: there are two junction nodes, 2 and 3, and there are three ways to get from one to the other: via the direct edge, via 0 and 1, and via 4.

One implication of this is that the junction graph will need to support being a multi-graph. This is difficult/impossible with a csr graph. 😞

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants