Locate nearest clusters for given data #214

pulquero · 2021-02-14T00:13:13Z

Fixes #208.

codecov · 2021-02-14T00:14:34Z

Codecov Report

Merging #214 (bf9d55e) into master (56dbd66) will increase coverage by 0.19%.
The diff coverage is 86.04%.

❗ Current head bf9d55e differs from pull request most recent head 0bd7b00. Consider uploading reports for the commit 0bd7b00 to get more accurate results

@@            Coverage Diff             @@
##           master     #214      +/-   ##
==========================================
+ Coverage   79.62%   79.82%   +0.19%     
==========================================
  Files          11       11              
  Lines         854      892      +38     
  Branches      186      199      +13     
==========================================
+ Hits          680      712      +32     
- Misses        141      144       +3     
- Partials       33       36       +3

Impacted Files	Coverage Δ
kmapper/kmapper.py	`88.67% <82.35%> (-0.79%)`	⬇️
kmapper/cover.py	`88.88% <100.00%> (+0.46%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 56dbd66...0bd7b00. Read the comment docs.

sauln · 2021-03-15T15:37:21Z

@pulquero , you have 2 PRs (this and #213). Should we review both, or does this one subsume the other?

pulquero · 2021-03-15T15:58:42Z

This subsumes both, so you could just review this one. Or if you want to do it in parts, start with the other.

deargle

Can you add a mini example to /examples like a mini vignette?

Edit: example here: https://github.com/sphinx-gallery/sphinx-gallery/blob/master/examples/no_output/just_code.py

deargle · 2021-03-18T18:17:18Z

kmapper/kmapper.py

+          ----------
+          newdata : Numpy array
+              New dataset. Accepts both 1-D and 2-D array.
+          nodes : dict


In theory, this could be all of the nodes from a graph, right? It would be horribly inefficient because there's no way that a point would be close to a cluster that wasn't even in its open set, but still.

If I'm thinking correctly, if it's just for efficiency, then nearest_clusters could itself receive the full graph, and itself call clusters_from_cover before looping over the nodes to get cluster_members.

Am I thinking correctly?

Also, I lean towards standardizing on replacing clusters with nodes throughout.

And eventually, but not now, replacing cube with openset throughout;

if so, that would look something like:

def nearest_nodes(self, newdata, graph, cover, data, nn): cube_ids = cover.find(newdata) nodes = self.find_nodes(graph, cube_ids) # then the rest unchanged... for cluster_id, cluster_members in nodes.items(): cluster_data = data[cluster_members] nn_data.append(cluster_data) nn_cluster_ids.append([cluster_id]*len(cluster_data)) nn_data = np.vstack(nn_data) nn_cluster_ids = np.concatenate(nn_cluster_ids) nn.fit(nn_data) nn_ids = nn.kneighbors(newdata, return_distance=False) return np.unique(nn_cluster_ids[nn_ids])

deargle · 2021-03-18T18:43:20Z

kmapper/kmapper.py

@@ -827,6 +827,63 @@ def data_from_cluster_id(self, cluster_id, graph, data):
        else:
            return np.array([])

+    def clusters_from_cover(self, cube_ids, graph):
+        """Returns the clusters and their members from the subset of the cover spanned by the given cube_ids


Thinking out loud. I'm trying to think of another name. Kmapper has a separate Cover class, so calling this clusters_from_cover suggests to me that a cover should be passed, but it isn't.

But a Cover doesn't have clusters, so I don't think this should go in the Cover class.

If graph were a class, this would go in there as graph.find_clusters_by_cube_ids(cube_ids) or something.

Sort-of following the pattern from the last PR, maybe we rename this to ~~find_clusters~~ find_nodes

deargle · 2021-03-18T19:02:23Z

In general in the above, I argue for converting nearest_nodes (formerly nearest_clusters) into a one-stop function that calls the other two created by this and the previous pr.

deargle · 2021-03-18T19:04:33Z

Side note, @sauln I didn't realize that sphinx-gallery would even render files that don't plot_ anything, see here. I'll go back and link to the rendered version of make-circles in our docs, it currently links out to github.

pulquero · 2021-03-19T20:55:04Z

Hit a bit of a snag, but we should be good now.

sauln mentioned this pull request Mar 15, 2021

Added clusters_from_cover to kmapper. #213

Closed

deargle reviewed Mar 18, 2021

View reviewed changes

pulquero force-pushed the closest_clusters branch 2 times, most recently from 458e094 to 4ae98ea Compare March 19, 2021 19:50

added clusters_from_cover to kmapper.

39d6a09

pulquero force-pushed the closest_clusters branch from 4ae98ea to 3bd6116 Compare March 19, 2021 20:40

Mark Hale added 2 commits March 19, 2021 20:46

added nearest_nodes and find_nodes to kmapper.

aded017

Added nearest_node example.

0bd7b00

pulquero force-pushed the closest_clusters branch from 3bd6116 to 0bd7b00 Compare March 19, 2021 20:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Locate nearest clusters for given data #214

Locate nearest clusters for given data #214

pulquero commented Feb 14, 2021

codecov bot commented Feb 14, 2021 •

edited

Loading

sauln commented Mar 15, 2021

pulquero commented Mar 15, 2021

deargle left a comment •

edited

Loading

deargle Mar 18, 2021 •

edited

Loading

deargle Mar 18, 2021 •

edited

Loading

deargle commented Mar 18, 2021

deargle commented Mar 18, 2021

pulquero commented Mar 19, 2021

Locate nearest clusters for given data #214

Are you sure you want to change the base?

Locate nearest clusters for given data #214

Conversation

pulquero commented Feb 14, 2021

codecov bot commented Feb 14, 2021 • edited Loading

Codecov Report

sauln commented Mar 15, 2021

pulquero commented Mar 15, 2021

deargle left a comment • edited Loading

Choose a reason for hiding this comment

deargle Mar 18, 2021 • edited Loading

Choose a reason for hiding this comment

deargle Mar 18, 2021 • edited Loading

Choose a reason for hiding this comment

deargle commented Mar 18, 2021

deargle commented Mar 18, 2021

pulquero commented Mar 19, 2021

codecov bot commented Feb 14, 2021 •

edited

Loading

deargle left a comment •

edited

Loading

deargle Mar 18, 2021 •

edited

Loading

deargle Mar 18, 2021 •

edited

Loading