Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Starting graph algebra #267

Merged
merged 14 commits into from
Sep 6, 2022
199 changes: 199 additions & 0 deletions examples/graph_algebra/gla_ex0_0.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,199 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Graph Algebra with `kglab`\n",
"\n",
"## intro\n",
"`kglab` provides tools to access graph data from multiple source to build a `KnowledgeGraph` that can be easily used by data scientists. For a thorough explanation of how to use triples-stored data and how to load this data into `kglab` please see examples in the `examples/` directory. The examples in this directory (`examples/graph_algebra/`) will care to introduce graph algebra capabilities to be used on the graphs the user has loaded. \n",
"\n",
"## basic load and querying\n",
"In particular, once your data is loaded in a `KnowledgeGraph` with something like:\n",
"\n",
"1. Instantiate a graph from a dataset:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<kglab.kglab.KnowledgeGraph at 0x7f283f3d3940>"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# for use in tutorial and development; do not include this `sys.path` change in production:\n",
"import sys ; sys.path.insert(0, \"../../\")\n",
"from os.path import dirname\n",
"import kglab\n",
"import os\n",
"\n",
"namespaces = {\n",
" \"foaf\": \"http://xmlns.com/foaf/0.1/\",\n",
" \"gorm\": \"http://example.org/sagas#\",\n",
" \"rel\": \"http://purl.org/vocab/relationship/\",\n",
" }\n",
"\n",
"kg = kglab.KnowledgeGraph(\n",
" name = \"Happy Vikings KG example for SKOS/OWL inference\",\n",
" namespaces=namespaces,\n",
" )\n",
"\n",
"kg.load_rdf(dirname(dirname(os.getcwd())) + \"/dat/gorm.ttl\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"2. It is possible to create a subgraph by providing a SPARQL query, by defining a \"subject\" and \"object\":\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"query = \"\"\"SELECT ?subject ?object\n",
"WHERE {\n",
" ?subject rdf:type gorm:Viking .\n",
" ?subject gorm:childOf ?object .\n",
"}\n",
"\"\"\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"## define a subgraph\n",
"In this case we are looking for the network of parent-child relations among members of Vikings family.\n",
"\n",
"With this query we can define a **subgraph** so to have access to **graph algebra** capabilities: "
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"from kglab.subg import SubgraphMatrix\n",
"\n",
"subgraph = SubgraphMatrix(kg=kg, sparql=query)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## compute Adjacency matrices\n",
"Let's compute the first basic adjacency matrix (usually noted with `A`):"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[0., 1., 1., 0., 0.],\n",
" [0., 0., 0., 1., 0.],\n",
" [0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 1.],\n",
" [0., 0., 0., 0., 0.]])"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"adj_matrix = subgraph.to_adjacency()\n",
"adj_matrix"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"what happened here is that all the subjects and objects have been turned into integer indices from 0 to number of nodes. So we can see that the entity with index 0 is adjancent (is connected, has a directed edge) to the entity with index 1. This is a directed graph because the relationship `gorm:childOf` goes from child to parent, let's turn this into an undirected graph so to see the relation in a more symmetric way (both the child-parent and parent-child)."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[0., 1., 1., 0., 0.],\n",
" [1., 0., 0., 1., 0.],\n",
" [1., 0., 0., 0., 0.],\n",
" [0., 1., 0., 0., 1.],\n",
" [0., 0., 0., 1., 0.]])"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"undirected_adj_mtx = subgraph.to_undirected()\n",
"undirected_adj_mtx"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can see now the relationship is a generic symmetric \"parenthood\" relations, not just a child-parent directed relationship."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.8.10 64-bit ('.venv': venv)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.10"
},
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "de68f9b565e1e230f4433adb1a318d8f3a0dfad0917fa0c696727472c8ddadbf"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}
65 changes: 65 additions & 0 deletions kglab/algebra.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
"""
Working with `SubgraphMatrix` as vectorized representation.
Additions to functionalities present in `subg.py`.
Integrate `scipy` and `scikit-learn` functionalities.

see license https://github.com/DerwenAI/kglab#license-and-copyright
"""
import typing

import networkx as nx
from networkx import DiGraph

class AlgebraMixin:
"""
Provides methods to work with graph algebra using `SubgraphMatrix` data.

NOTE: provide optional Oxigraph support for fast in-memory computation
"""
nx_graph: typing.Optional[DiGraph] = None

def to_undirected(self):
return nx.to_numpy_array(self.nx_graph.to_undirected())

def to_adjacency(self):
"""
Return adjacency (dense) matrix for the KG.
[Relevant NetworkX interface](https://networkx.org/documentation/stable/reference/convert.html#id2)

returns:
`numpy.array`: the array representation in `numpy` standard
"""
self.check_attributes()
return nx.to_numpy_array(self.nx_graph)

def to_incidence(self):
"""
Return incidence (dense) matrix for the KG.
[Relevant scipy docs](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csr_matrix.html)

returns:
`numpy.array`: the array representation in `numpy` standard
"""
self.check_attributes()
return nx.incidence_matrix(self.nx_graph).toarray()

def to_laplacian(self):
"""
Return Laplacian matrix for the KG. Graph is turned into undirected.
[docs](https://networkx.org/documentation/stable/reference/generated/networkx.linalg.laplacianmatrix.laplacian_matrix.html)

returns:
`numpy.array`: the array representation in `numpy` standard
"""
self.check_attributes()
return nx.laplacian_matrix(self.nx_graph.to_undirected()).toarray()

def to_scipy_sparse(self):
"""
Return graph in CSR format (optimized for matrix-matrix operations).

returns:
SciPy sparse matrix: Graph adjacency matrix.
"""
self.check_attributes()
return nx.to_scipy_sparse_array(self.nx_graph)
43 changes: 43 additions & 0 deletions kglab/networks.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
"""
Working with `SubgraphMatrix` as vectorized representation.
Additions to functionalities present in `subg.py`.
Integrate `scikit-network` functionalities.

see license https://github.com/DerwenAI/kglab#license-and-copyright
"""

import sknetwork as skn

class NetAnalysisMixin:
"""
Provides methods for network analysis tools to work with `KnowledgeGraph`.
"""
def get_distances(self, adj_mtx):
"""
Compute distances according to an adjacency matrix.
"""
self.check_attributes()
return skn.path.get_distances(adj_mtx)

def get_shortest_path(self, adj_matx, src, dst):
"""
Return shortest path from sources to destinations according to an djacency matrix.

adj_mtx:
numpy.array: adjacency matrix for the graph.
src:
int or iterable: indices of source nodes
dst:
int or iterable: indices of destination nodes

returns:
list of int: a path of indices
"""
self.check_attributes()
return skn.path.get_shortest_path(adj_matx, src, dst)


# number of nodes, number of edges
# density
# triangles
# reciprocity
10 changes: 7 additions & 3 deletions kglab/query/mixin.py
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,8 @@ def query_as_df (
pythonify: bool = True,
) -> pd.DataFrame:
"""
Wrapper for [`rdflib.Graph.query()`](https://rdflib.readthedocs.io/en/stable/apidocs/rdflib.html?highlight=query#rdflib.Graph.query) to perform a SPARQL query on the RDF graph.
Wrapper for [`rdflib.Graph.query()`](https://rdflib.readthedocs.io/en/stable/apidocs/rdflib.html?highlight=query#rdflib.Graph.query)
to perform a SPARQL query on the RDF graph.

sparql:
text for the SPARQL query
Expand Down Expand Up @@ -123,7 +124,8 @@ def visualize_query (
notebook: bool = False,
) -> pyvis.network.Network:
"""
Visualize the given SPARQL query as a [`pyvis.network.Network`](https://pyvis.readthedocs.io/en/latest/documentation.html#pyvis.network.Network)
Visualize the given SPARQL query as a
[`pyvis.network.Network`](https://pyvis.readthedocs.io/en/latest/documentation.html#pyvis.network.Network)

sparql:
input SPARQL query to be visualized
Expand All @@ -144,7 +146,9 @@ def n3fy (
pythonify: bool = True,
) -> typing.Any:
"""
Wrapper for RDFlib [`n3()`](https://rdflib.readthedocs.io/en/stable/utilities.html?highlight=n3#serializing-a-single-term-to-n3) and [`toPython()`](https://rdflib.readthedocs.io/en/stable/apidocs/rdflib.html?highlight=toPython#rdflib.Variable.toPython) to serialize a node into a human-readable representation using N3 format.
Wrapper for RDFlib [`n3()`](https://rdflib.readthedocs.io/en/stable/utilities.html?highlight=n3#serializing-a-single-term-to-n3)
and [`toPython()`](https://rdflib.readthedocs.io/en/stable/apidocs/rdflib.html?highlight=toPython#rdflib.Variable.toPython)
to serialize a node into a human-readable representation using N3 format.

node:
must be a [`rdflib.term.Node`](https://rdflib.readthedocs.io/en/stable/apidocs/rdflib.html?highlight=Node#rdflib.term.Node)
Expand Down
Loading