Integration with scipy and scikit-learn #261

Mec-iS · 2022-08-30T16:02:18Z

One of the integration we are going to work on is the one with scikit-learn.

This conversation is to collect requirements and features to implement calling scikit-learn using kglab abstraction layer.

My point of view after taking a look to the API provided by popular data science libraries, these are the interesting scikit-learn and scipy functionalities that we could start with:

Allow converting kglab's KnowledgeGraph data structures to observations matrix (to be defined), adjacency matrix and condensed distance matrix as defined by scipy. This will allow building up further flows (or "pipelines", chains of function calls) that the users can assemble to go from a KnowledgeGraph representation to a graph algebra representations. This is critical as we need to pick first principles or to provide different alternatives according to the type of graph or the different tasks the users may want to accomplish.
After 1, let's start with an example flow in kglab for SciPy's Hierachical Clustering. It would be nice to have a flow that allow simple clustering. This implies providing switches to:
1. Linkage procedures
2. Tree building like sklearn.cluster.ward_tree

Other possible examples:

These are now in unordered fashion, will take some time to figure out which principles to import from scikit-learn and scipy so to build up proper user flows from knowledge graph as represented in RDF/kglab and graph algebra representations.

Please provide feedback and suggestions. I will create a Github project around this effort.

cc: @tomaarsen @SultanOrazbayev

The text was updated successfully, but these errors were encountered:

ceteri · 2022-09-01T06:05:21Z

Wonderful! This is super helpful.
The nearest neighbor parts would have some immediate use cases.

BTW, there's already the SubgraphMatrix class in subg.py which handles the transform/inverse_transform from an RDF graph to:

pandas.DataFrame
iGraph (adjacency matrix, slightly odd/tangled format)
NetworkX (adjacency matrix, as an edge list)
cuGraph (adjacency matrix, as an edge list for cuDF)

Mec-iS · 2022-09-01T10:29:16Z

we probably want some methods that returns numpy.array, I will reuse what it is already there for sure.

Mec-iS · 2022-09-06T12:18:49Z

@SultanOrazbayev mentioned the importance of having a descriptive summary of general metrics about a graph, something like pandas.describe(). These are the metrics that could be useful in an hypothetical SubgraphMatrix.describe():

number of nodes, number of edges
density
triangles
reciprocity

tomaarsen · 2022-09-08T19:29:22Z

Agreed, sometimes it's hard to actually understand what kind of graph you're using..

Mec-iS self-assigned this Aug 30, 2022

Mec-iS added the enhancement New feature or request label Aug 30, 2022

ceteri added this to the Machine Learning integration milestone Aug 31, 2022

Mec-iS mentioned this issue Sep 5, 2022

Starting graph algebra #267

Merged

SultanOrazbayev mentioned this issue Dec 6, 2022

nx.info() no longer informative networkx/networkx#5326

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integration with scipy and scikit-learn #261

Integration with scipy and scikit-learn #261

Mec-iS commented Aug 30, 2022

ceteri commented Sep 1, 2022

Mec-iS commented Sep 1, 2022

Mec-iS commented Sep 6, 2022 •

edited

Loading

tomaarsen commented Sep 8, 2022 •

edited

Loading

Integration with scipy and scikit-learn #261

Integration with scipy and scikit-learn #261

Comments

Mec-iS commented Aug 30, 2022

ceteri commented Sep 1, 2022

Mec-iS commented Sep 1, 2022

Mec-iS commented Sep 6, 2022 • edited Loading

tomaarsen commented Sep 8, 2022 • edited Loading

Mec-iS commented Sep 6, 2022 •

edited

Loading

tomaarsen commented Sep 8, 2022 •

edited

Loading