Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add function to make nx Graph from Skeleton #224

Merged
merged 22 commits into from
May 1, 2024
Merged

Conversation

ns-rse
Copy link
Contributor

@ns-rse ns-rse commented Jan 10, 2024

Further to initial work by @jni to implement NetworkX based pruning I've finally worked out how to reconstruct Skeleton from the NetworkX graph by storing the path_coordinates() in the edge properties.

We save the .path_coordinates() to the edge properties of a NetworkX graph, using the row.Index and the node_id_src / node_id_dst to define the nodes. This is required because in some of the test instances there are loops with the same src and dst and if only these are used we would lose information. Early work I was recreating the summarize() Pandas dataframes and the number of rows were differing which made me scratch my head until I realised the information was being over-written because the src and dst were the same. Including the row.Index as a proxy for the graph segment avoids this.

As the .path_coordinates() are saved as edge properties in NetworkX object we can now reconstruct the original Numpy array based on this information and convert to a Skeleton.

Tests are included for a range of sample graphs and exceptions are raised (and tested) when...

  • Wrong type of object is passed into nx_to_skeleton().
  • The NetworkX object is missing edge properties completely.
  • The edge properties are co-ordinates outside of the original images dimensions.

There may be more edge cases that we could/should test for.

I found problems in reconstructing tinyline as it was a 1-D "image" and so have made this a 2-D "image" with a single row and updated test_line() appropriately. This may be inappropriate as the intention is to have a 1-D skeleton but most skeletons will be 2-D, can be reverted if necessary.

Also on reviewing this still includes WIP on iterative pruning. It desirable this could be removed to keep the PR more focused on going from and to NetworkX graph objects.

jni and others added 6 commits November 27, 2023 13:50
Need to merge changes in though as have conflicting variable names which are already on `jni:main`.
Saves the `.path_coordinates()` to the edge properties of a NetworkX graph, using the `row.Index` _and_ the
`node_id_src` / `node_id_dst` to define the nodes. This is required because in some instances there are loops with the
same `src` and `dst` and if only these are used we would lose information, including the `row.Index` as a proxy for the
graph segment avoids this.

As the `.path_coordinates()` are saved as `edge` properties in NetworkX object we can now reconstruct the original Numpy
array based on this information and convert to a `Skeleton`.

Tests are included for a range of sample graphs and exceptions are raised (and tested) when...

+ Wrong type of object is passed into `nx_to_skeleton()`.
+ The NetworkX object is missing `edge` properties completely.
+ The `edge` properties are co-ordinates outside of the original images dimensions.

There may be more edge cases that we could/should test for.

I found problems with `tinyline` as it was a 1-D "image" and so have made this a 2-D "image" with a single row and
updated `test_line()` appropriately.
@jni
Copy link
Owner

jni commented Jan 11, 2024

This is required because in some of the test instances there are loops with the same src and dst and if only these are used we would lose information.

Storing multiple IDs in one edge is an interesting solution @ns-rse, but would it be simpler to just use a networkx.MultiGraph?

Also on reviewing this still includes WIP on iterative pruning. It desirable this could be removed to keep the PR more focused on going from and to NetworkX graph objects.

I think that's ok. It's often useful to include a motivating example for a new function. 😊

@ns-rse
Copy link
Contributor Author

ns-rse commented Jan 11, 2024

networkx.MultiGraph() does indeed look like the way forward. I'd just not found it yet whilst reading through docs, hence the brute force solution.

I'll work on substituting that when I'm back on this project next week.

Use [networkx.MultiGraph()](https://networkx.org/documentation/stable/reference/classes/multigraph.html).

Greatly simplifies things, we can...

+ Simply add edges with path coordinates as values, if nodes aren't defined they are added automatically.
+ If there are multiple edges between two nodes the respective path coordinates are automatically handled as additional
  edges are stored using unique keys to identify the edge.
+ Node co-ordinates are no longer explicitly required (they are implicitly stored in path co-ordinates)
+ Reconstruction of original Skeleton is simplified as we can access the data (path co-ordinates) directly.

Yapf has done some reformatting (checked it wasn't Black interfering this time).
@ns-rse
Copy link
Contributor Author

ns-rse commented Jan 12, 2024

Was in the groove so went ahead and switched to MultiGraph() which was thankfully relatively painless. Thanks for the pointer @jni

@ns-rse
Copy link
Contributor Author

ns-rse commented Jan 29, 2024

Hi @jni,

Using Multigraph() was an excellent solution, let me know if there is anything else that can be improved.

@jni
Copy link
Owner

jni commented Jan 29, 2024

Thanks for the ping @ns-rse! I will review this today!

@@ -106,5 +106,9 @@ skan = [
[tool.setuptools_scm]
write_to = 'src/skan/_version.py'

[tool.pytest.ini_options]
minversion = "7.4.2"
addopts = "-W ignore"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you add this? (ie what warnings are we ignoring here and is that wise 😅)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently its our own deprecation warnings that were recently added for the removal of _.

src/skan/test/test_csr.py:407
  /home/neil/work/git/hub/ns-rse/skan/src/skan/test/test_csr.py:407: VisibleDeprecationWarning: separator in column name will change to _ in version 0.13; to silence this warning, use `separator='-'` to maintain current behavior and use `separator='_'` to switch to the new default behavior.
    csr.summarize(csr.Skeleton(skeleton1)),

Happy to remove this though I only added it so it was easier to read test output whilst writing them.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave it for now and make an issue to either wrap those in pytest.warns or pass the separator arg.

src/skan/csr.py Outdated
np_img = np.zeros(orig_dim)
x, y = [], []
try:
for edge in g.edges.data(data='path', default=1):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why default=1 here? There shouldn't be any edges without data in the skeleton? If there is one, then it's not a valid input skeleton (?) and we should probably just raise an error? (And this is equally easy to do with None as with 1?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly can't remember now, sorry.

Have removed it and left it as default, tests pass ok.

The defensive guards were imho *overly* defensive and made the function harder to understand. At any rate I was not convinced that they were exhaustively diagnostic, and in some cases caught errors that should be caught by type checkers and following the documented inputs to the function.
We can save necessary attributes in the graph attributes of the multigraph, and they can be used later.

We also store the pixel *values* so that we can restore not only the positions of the skeleton but also the float values (if any).
@jni
Copy link
Owner

jni commented Jan 30, 2024

@ns-rse I've pushed a few changes:

  • I changed the code to use n-dimensional coordinates rather than 2D-only x/y (which wouldn't work for 3D in addition to not working for the admittedly-pretty-absurd 1D)
  • But since I could keep 1D now I reverted those changes 😂
  • I added the shape and dtype of the image to the graph attributes, so they don't need to be passed in to the function
  • I kept the path values in addition to the coordinates — this means that we can restore even float images
  • I removed all of the defensive guards, which I think mostly caught errors that should be caught by the type checker / reading the docs. I'm willing to bring some back if we find some common pitfalls. The one I'm most worried about is people passing nx graphs without the necessary attributes — but I think those should be assertion errors rather than inferring the errors raised by the calls, as it's easier to trivially understand what the code is testing.

I'm seeing now that all the 3.8 tests are failing. I'd be happy to drop 3.8 since most of the scipy ecosystem has dropped it, but I have no earthly idea why they are failing:

______________ test_skeleton_to_nx[skeleton_linear1 (no summary)] ______________

np_skeleton = array([[False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
      ...alse],
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False]])
summary = None, edges = 24, nodes = 24

    @pytest.mark.parametrize(
            ('np_skeleton', 'summary', 'nodes', 'edges'),
            [
                    pytest.param(
                            tinycycle, None, 1, 1, id='tinycircle (no summary)'
                            ),
                    pytest.param(tinyline, None, 2, 1, id='tinyline (no summary)'),
                    pytest.param(
                            skeleton0,
                            csr.summarize(csr.Skeleton(skeleton0), separator='_'),
                            4,
                            3,
                            id='skeleton0 (with summary)',
                            ),
                    pytest.param(
                            skeleton1, None, 4, 4, id='skeleton1 (no summary)'
                            ),
                    pytest.param(
                            skeleton1,
                            csr.summarize(csr.Skeleton(skeleton1)),
                            4,
                            4,
                            id='skeleton1 (with summary)'
                            ),
                    pytest.param(
                            skeleton2,
                            csr.summarize(csr.Skeleton(skeleton2)),
                            8,
                            8,
                            id='skeleton2 (with summary)'
                            ),
                    # pytest.param(skeleton3d, None, 7, 7, id='skeleton3d (no summary)'),
                    pytest.param(
                            skeleton_loop1,
                            None,
                            10,
                            10,
                            id='skeleton_loop1 (no summary)'
                            ),
                    pytest.param(
                            skeleton_loop2,
                            None,
                            10,
                            10,
                            id='skeleton_loop2 (no summary)'
                            ),
                    pytest.param(
                            skeleton_linear1,
                            None,
                            24,
                            24,
                            id='skeleton_linear1 (no summary)'
                            ),
                    pytest.param(
                            skeleton_linear2,
                            None,
                            4,
                            3,
                            id='skeleton_linear2 (no summary)'
                            ),
                    pytest.param(
                            skeleton_linear3,
                            None,
                            20,
                            17,
                            id='skeleton_linear3 (no summary)'
                            ),
                    ],
            )
    def test_skeleton_to_nx(
            np_skeleton: npt.NDArray, summary: pd.DataFrame | None, edges: int,
            nodes: int
            ) -> None:
        """Test creation of NetworkX Graph from skeletons arrays and summary."""
        skeleton = csr.Skeleton(np_skeleton)
        skan_nx = csr.skeleton_to_nx(skeleton, summary)
>       assert skan_nx.number_of_nodes() == nodes
E       assert 22 == 24
E        +  where 22 = <bound method Graph.number_of_nodes of <networkx.classes.multigraph.MultiGraph object at 0x1304ca370>>()
E        +    where <bound method Graph.number_of_nodes of <networkx.classes.multigraph.MultiGraph object at 0x1304ca370>> = <networkx.classes.multigraph.MultiGraph object at 0x1304ca370>.number_of_nodes

../../../hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/skan/test/test_csr.py:464: AssertionError
=========================== short test summary info ============================
FAILED test/test_csr.py::test_skeleton_to_nx[skeleton_linear1 (no summary)] - assert 22 == 24
 +  where 22 = <bound method Graph.number_of_nodes of <networkx.classes.multigraph.MultiGraph object at 0x1304ca370>>()
 +    where <bound method Graph.number_of_nodes of <networkx.classes.multigraph.MultiGraph object at 0x1304ca370>> = <networkx.classes.multigraph.MultiGraph object at 0x1304ca370>.number_of_nodes
======================== 1 failed, 98 passed in 20.62s =========================

That is a strange way to fail for a Python version change... 🤔 Any thoughts about what could be causing this?

Anyway, is the pruning working with this? If not, we should probably indeed remove the pruning code from this PR, or bring it up to speed.

Thanks for getting this as far as you have! Looking at it now, I think it's actually feasible to make a Skeleton directly rather than going via an image — we'd need to rebuild the sparse graph structures, the coordinates, and the path ids. Since those are usually a factor of >10 smaller compared to the image, it's probably worth thinking about. But we should probably leave that optimisation for a later PR — first let's get this over the finish line, as I think it's very close. I just wanted to flag that I'm excited about this finally being an efficient operation!

PS in your last commit removing the default value (which I overwrote, sorry!), black formatting had again reformatted the whole file... Might need to check your editor settings again... 😅

So, to do:

  • either finish the pruning or remove it from this PR.
  • figure out why on earth 3.8 is failing
  • (I just saw) uncomment the param testing 3D images

I'm signing off for the day, but let me know if you want me to tackle these in the next few days or if you want to have a crack at them!

@ns-rse
Copy link
Contributor Author

ns-rse commented Jan 30, 2024

Thanks for the rapid improvements @jni

Of the three tasks...

  • (I just saw) uncomment the param testing 3D images

Done, these pass.

  • figure out why on earth 3.8 is failing

Agree its a very strange error.

I've tested locally in a fresh 3.8 environment for sanity and can recreate the problem so started looking at installed versions of packages between Python 3.8 and Python 3.11. There are quite a few differences (e.g. Numpy differences of 1.24.4 under Python 3.8 compared to 1.26.3 under Python 3.11) but the NetworkX versions are identical (both 3.1 under both versions of Python).

The test is failing under test_skeleton_to_nx() so I checked the skeleton that underpins this (by converting the NetworkX object back to Skeleton using csr.nx_to_skeleton()) and its identical to the original skeleton so its baffling that the number of nodes counted differ.

I did some trawling through the NetworkX issues and found a possibly related issue in their own test suite from 2019 test_numpy_type fails under Python 3.8 · Issue #3720 · networkx/networkx. It seems related to the order in which attributes are added under 3.8 and was purportedly addressed in Fix test_numpy_type to pass under Python 3.8 by s-t-e-v-e-n-k · Pull Request #3724 · networkx/networkx (see also detailed info here). This was merged and included in NetworkX 2.5 so I'd be surprised if its the same underlying issue though.

Worth noting that the current version of NetworkX has requires-python = '>=3.10' and so if Skan is to depend on NetworkX going forward it may be easiet to align versions.

If you are happy to do that let me know and I'll adjust Skan's pyproject.toml.

  • either finish the pruning or remove it from this PR.

I started looking at this copying some tests from #214 over but think it might take me a little time to work through this.

In #214 the aim of csr.iteratively_prune_paths() was geared towards the TopoStats use case which for linear skeletons was to prune side-branches leaving the longest branch and for looped skeletons remove side branches and retain the largest loop.

In this PR csr.iteratevly_prune_paths() now works differently, in so much as...

  1. It prunes NetworkX graph objects.
  2. It does so until there is just one object left.

I copied over the tests from #214 and (and tweaked to pass NetworkX representations tocsr.iteratively_prune_paths()) unsurprisingly these failed . I think it would be sensible to remove the pruning from this PR and I'll work on developing tests later this week/next.

If you are happy with this let me know and I shall remove the iterative pruning functions from this PR.

@jni
Copy link
Owner

jni commented Jan 31, 2024

If you are happy with this let me know and I shall remove the iterative pruning functions from this PR.

I think that's a good plan, as it will make this PR easier to review and get in quickly.

If you are happy to do that let me know and I'll adjust Skan's pyproject.toml.

Yeah I don't intend to keep supporting 3.8 but I am simply worried that it's actually exposing some kind of numerical instability in our approach that is simply masked by later versions. So it would be good before ripping it out to understand why it's failing. The approach I would take is:

  • write out the versions of the packages installed in 3.8 vs 3.9
  • make a new env in 3.9 with the default packages
  • downgrade the packages one by one until it breaks

I might also step through the relevant code with a debugger in 3.8 and in 3.10 simultaneously to see where the data starts to change.

Work is saved on `ns-rse/networkx-pruning` and will form a separate PR.
Discovered that single skeleton > networkx failing under Python 3.8 is due to spurious edge detection . See jni#225 for
further details.
@ns-rse
Copy link
Contributor Author

ns-rse commented Apr 19, 2024

See #225 for details of why tests were failing under Python 3.8.

@jni jni marked this pull request as ready for review April 20, 2024 04:38
@jni jni merged commit b703cf6 into jni:main May 1, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants