Skip to content

Latest commit

 

History

History
356 lines (267 loc) · 17.9 KB

telco.adoc

File metadata and controls

356 lines (267 loc) · 17.9 KB

Telco Network Management Using A Graph Database

Introduction

Telecommunication companies need a way to predict and warn customers in advance of any service interruptions in order to maintain customer service agreements and avoid financial penalties of unplanned downtime. This GraphGist explores how we can use Cypher to find weak areas of a network and gain insight into how to efficiently allocate resources.

Create the Network

In the following example network, one service node provides services to a variety of clients through a network of linked resources. Each link can be in one of three states: 'open', 'closed', or 'broken'. If a resource is using a link, it is considered 'closed', and no other resource can use it. If a resource is not using a link, it is considered 'open'. If something happens to a link that causes it to become unusable, the link is considered 'broken', and the network is at risk.

The Nodes: Although abstract in this example, nodes in this model represent anything that receives, sends, or relays information. In the real world, they would include but not be limited to datacenters, switchboards, hospitals, restaurants. In this Neo4j graph, the majority of nodes carry the RESOURCE label. There is one node with a SERVICE label. Everything connected to the SERVICE node is part of the telecommunication network.

The Edges: Also abstract in this example, the edges in this model represent the structures through which information is passed. In the real world, these could be telephone wires, wireless signals, Ethernet cables, or similar. In this Neo4j graph, edges carry the LINKS label. Although inherently directional, this assumes all LINKS are bidirectional.

//create the resource nodes
CREATE 	(`1`:RESOURCE {resource_id:1}),
			(`2`:RESOURCE {resource_id:2}),
			(`3`:RESOURCE {resource_id:3}),
			(`4`:RESOURCE {resource_id:4}),
			(`5`:RESOURCE {resource_id:5}),
			(`6`:RESOURCE {resource_id:6}),
			(`7`:RESOURCE {resource_id:7}),
			(`8`:RESOURCE {resource_id:8}),
			(`9`:RESOURCE {resource_id:9}),
			(`10`:RESOURCE {resource_id:10}),
			(`11`:RESOURCE {resource_id:11}),
			(`12`:RESOURCE {resource_id:12}),
			(`13`:RESOURCE {resource_id:13}),
			(`14`:SERVICE {service_id:1}),
			(`15`:RESOURCE {resource_id:15})

//create the channels
CREATE	(`1`)-[:LINKS {status:'open'}]->(`2`),(`1`)-[:LINKS {status:'open'}]->(`3`),(`1`)-[:LINKS {status:'open'}]->(`4`),
			(`2`)-[:LINKS {status:'open'}]->(`3`),(`2`)-[:LINKS {status:'open'}]->(`4`),
			(`3`)-[:LINKS {status:'open'}]->(`5`),
			(`4`)-[:LINKS {status:'open'}]->(`5`),
			(`6`)-[:LINKS {status:'open'}]->(`1`),
			(`7`)-[:LINKS {status:'open'}]->(`6`),(`7`)-[:LINKS {status:'open'}]->(`8`),
			(`8`)-[:LINKS {status:'open'}]->(`6`),(`8`)-[:LINKS {status:'open'}]->(`9`),
			(`10`)-[:LINKS {status:'open'}]->(`11`),(`10`)-[:LINKS {status:'open'}]->(`12`),
			(`12`)-[:LINKS {status:'open'}]->(`11`),
			(`13`)-[:LINKS {status:'open'}]->(`8`),(`13`)-[:LINKS {status:'open'}]->(`9`),(`13`)-[:LINKS {status:'open'}]->(`10`),
			(`14`)-[:LINKS {status:'open'}]->(`7`),(`14`)-[:LINKS {status:'open'}]->(`8`),(`14`)-[:LINKS {status:'open'}]->(`13`),
			(`15`)-[:LINKS {status:'open'}]->(`9`)
RETURN *

Uncovering the Need for Additional Redundancy

Although most of the resources in our example network above are connected to the main service by multiple links, there are some subsections of the network that are connected by only one link (a bridge). Although this lack of redundancy might be necessary due to lack of funds or other reasons, if the bridge fails (a fire melts the cables connecting two resources, a hurricane takes out a datacenter) a large chunk of our network will suffer an outage. In short, redundancy is necessary both for the avoidance of outages and well-planned redundancy necessary to keep costs under control.

Finding and Dealing with Bridges

A bridge refers to an edge (or in Neo4j, a relationship), that when deleted, increases the number of connected components. In the example below, both edge a or b could be bridges, as their deletion would result in two connected components, only one of which would include Bob.

example

Let’s find bridges in our network:

MATCH path = (a:RESOURCE)-[r0]->(b:RESOURCE)
WHERE NOT (a)-[*2..3]-(b)
RETURN DISTINCT a.resource_id AS `Endpoint 1 ID`, r0.status AS `Status`, b.resource_id AS `Endpoint 2 ID`

Naturally, if anything happens to any of these channels a chunk of our network will go down. Ideally, we’d have redundancies and multiple backups to take care of any possible outage. However, resources and the implementation of resources are expensive.

Resilience During Unplanned Network Outages

Introduction

The Subgraph on the 1 Side of the 1-6 Path

Let’s take a look at the subgraph connected by the bridge between network resource 1 and 6. Let’s label all of the nodes in this subgraph 'subgraph':

MATCH path = (a:RESOURCE { resource_id:1 })-[*1..3]-(c:RESOURCE)
WHERE ALL (n IN nodes(path) WHERE NOT n.resource_id=6)
FOREACH (n IN nodes(path)| SET n :subgraph)
RETURN path

The members of subgraph SUBGRAPH

MATCH path = shortestPath((a:subgraph)-[*..3]-(b))
WHERE NOT (b:subgraph)
RETURN DISTINCT a.resource_id AS `RESOURCE ID`, min(length(relationships(path))) AS `DISTANCE TO NETWORK`
ORDER BY `RESOURCE ID` ASC

As demonstrated in the table and image, all information from members of the subgraph must pass through resource 1 to reach a resource that is not in the subgraph. Resources 2, 3, and 4 are all one hop away from resource 1, and therefore have their own links to 1, but resource 5, being two hops away, must share at least one link with another resource on its way to 1 and the rest of the network.

Even without analysis, resource 5 seems like the most likely candidate for the allocation of additional resources.

The need for additional connections evident: if a drunk driver or nest of squirrels were to eliminate this bridge, the entire subgraph will go down. The next question: Assuming the owner of the network has the funds for only one additional link to the subgraph, what is the optimal location to place it?

For this example, let’s make the following assumptions about the network and the network owners' needs:

  • While resource A is communicating with resource B through a link or set of channels, no other resource can use the link for the duration of the communication.

  • When it rains it pours: We are trying to determine where to place additional links based on the worst-case scenario.

    • More than one channel will fail at the same time, and we know the channels between resource 3 and 1 and 3 and 2 are in a particularly vulnerable area.

    • During crisis all resources will be trying to reach the parent network simultaneously, causing network congestion

  • The network prefers shortest paths

Modeling a Crisis

MATCH p = (a:subgraph)-[r]-(b:subgraph)
WHERE a.resource_id = 3 AND b.resource_id < 3
SET r.status = 'broken'
RETURN a.resource_id, r.status, b.resource_id

Let’s make sure the query worked - resource 3 should have two broken links and one open link:

MATCH p = (a)-[r]-(b)
WHERE a.resource_id = 3
RETURN a.resource_id AS `FIRST RESOURCE`, r.status AS `LINK`, b.resource_id AS `SECOND RESOURCE`
All RESOURCEs Attempt to Find a Path to RESOURCE 1

Although in the real world this would happen near simultaneously, let’s look at the paths one by one.

First, resource 2 will attempt to reach resource 1 through the open links. If it succeeds, it sets all links on the path to 1 as 'closed'. Let’s take a look at lengths of the possible paths:

MATCH p = (a:subgraph)-[r*..3]-(b:subgraph)
WHERE a.resource_id = 1 AND b.resource_id = 2 AND ALL (r1 IN relationships(p) WHERE r1.status='open')
RETURN DISTINCT length(p) AS `Length of Path`

There are two possible paths, and only one shortest path. Resource 2 is going to take the shortest path to resource 1, closing links on the way:

MATCH p = shortestPath((a:subgraph)-[r*..3]-(b:subgraph))
WHERE a.resource_id = 1 AND b.resource_id = 2 AND ALL (r1 IN relationships(p) WHERE r1.status='open')
FOREACH (x IN relationships(p) | SET x.status = 'closed')
RETURN relationships(p)

Now resource 3 tries to reach resource 1 through the open links. If it succeeds, it too sets all links on the path to 1 as 'closed'. Let’s take a look at the options:

MATCH p = (a:subgraph)-[r*..3]-(b:subgraph)
WHERE a.resource_id = 1 AND b.resource_id = 3 AND ALL (r1 IN relationships(p) WHERE r1.status='open')
RETURN DISTINCT relationships(p)

There’s only one path to resource 1, so let’s re-run the query, this time setting all the links on the path to 1 as 'closed'. Network congestion is increasing.

MATCH p = (a:subgraph)-[r*..3]-(b:subgraph)
WHERE a.resource_id = 1 AND b.resource_id = 3 AND ALL (r1 IN relationships(p) WHERE r1.status='open')
FOREACH (x IN relationships(p) | SET x.status = 'closed')
RETURN DISTINCT relationships(p)

Now resource 4 attempts to reach resource 1:

MATCH p = (a:subgraph)-[r*..3]-(b:subgraph)
WHERE a.resource_id = 1 AND b.resource_id = 4 AND ALL (r1 IN relationships(p) WHERE r1.status='open')
RETURN DISTINCT relationships(p)

Resource 4 is blocked! What about resource 5?

MATCH p = (a:subgraph)-[r*..3]-(b:subgraph)
WHERE a.resource_id = 1 AND b.resource_id = 5 AND ALL (r1 IN relationships(p) WHERE r1.status='open')
RETURN DISTINCT relationships(p)

All possible paths to resource 1 are closed to resource 5 and resource 4.

In the case of network congestion and the failure of two links, two resources are unable to reach the service. As we have only enough resources for one additional link from the network, we now have to determine an appropriate location for the new link.

Assuming resources 3, 4, and 5 are of equal priority, and that the links between 1 and 3 and 2 and 3 will continue to have problems with connectivity, we still have a few options in which to place our new link. Recall that in the intact network resource 5 has the longest path to the network.

A link at 5:
  • Gives resource 5 a shorter path to the network when there are no outages (path of length 3 to path of length 1)

  • Allows resource 3 to contact the network without blocking resource 4’s access to the network

  • In the scenario described above (congested network with weak links down), one resource (either 3 or 5) will still be unable to access the network

A link at 4:
  • Gives resource 5 a shorter path to the network when there are no outages (path of length 3 to path of length 2)

  • Does not allows resource 3 to contact the network without blocking resource 4’s access to the network

  • In the scenario described above (congested network with weak links down), one resource (3) will still be unable to access the network

A link at 3:
  • Gives resource 5 a shorter path to the network when there are no outages (path of length 3 to path of length 2)

  • Allows resource 3 to contact the network without blocking resource 4’s access to the network

  • In the scenario described above (congested network with weak links down), one resource (either 4 or 5) will still be unable to access the network

Since all options result in one resource being blocked and placing a link at resource 5 results in the shortest path for resource 5 in both the damaged and intact network, let’s place the new link at resource 5.

MATCH (a {resource_id: 5}), (b:SERVICE)
MERGE (a)<-[:LINKS {status:'open'}]-(b)
MATCH path = shortestPath((a:subgraph)-[*..3]-(b))
WHERE NOT (b:subgraph)
RETURN DISTINCT a.resource_id AS `RESOURCE ID`, min(length(relationships(path))) AS `DISTANCE TO NETWORK`
ORDER BY `RESOURCE ID` ASC

Conclusion

Sub-graph Fault Tolerant Routing

Subgraph Fault-Tolerant Routing (SFTR) is a strategy for planning for the inevitable - resource or connection outage. Sometimes a squirrel builds a nest in part of your network. Although we have to accept that parts of our networks will break, we can also determine which components will be able to reroute and which will have the potential to be catastrophic failures.

L+1 sub-graph routing is a strategy for routing dependable connections in optical networks. In this approach each network is mapped into L distinct sub-graphs resulting from the removal of links (in this example, only one link) from the original network.

A connection from node A to B in this scheme becomes “accepted”–-in other words, identified as not potentially catastrophic–-only if it is there is a path from A to B in all sub-graphs. Ideally, we would design a network in which there is always a path from A to B given any network failure.

Why Neo4j?

The problem of modeling a live Telco network was a good fit for Neo4j’s solution, which uses nodes and relationships to describe assets on the network (switches, routers, cell towers), and the links between them (trunks, fiber optic cables, VPNs). Neo4j places no restrictions on the way the data is structured, or the data that is captured: it can model and represent the new network in a natural way. This extreme flexibility saves a great deal of time, and makes it possible to represent complex data and abstract concepts at the same time, within the same database. This is extremely powerful.

Use Case: Vivendi SFR

SFR logo

Owned by Vivendi, the French multinational media and telecommunications company, SFR is the second largest telecommunications company in France, earning nearly 12 billion Euros in annual revenue.

SFR needed a way to predict and warn customers in advance of any service interruptions in order to maintain customer service agreements and avoid financial penalties of unplanned downtime. SFR tasked a 10-person project team to find a network management solution, and brought in software consultants from London-based OpenCredo to provide best practice expertise. The team selected the Neo4j graph database to build a proof of concept app that could pinpoint any “single point of failure” across the components of the SFR multi-system network.

Cypher Appendix

Finding Bridges between RESOURCEs

MATCH path = (a:RESOURCE)-[r0]->(b:RESOURCE)
WHERE NOT (a)-[*2..3]-(b)
RETURN DISTINCT a.resource_id AS `Endpoint 1 ID`, r0.status AS `Status`, b.resource_id AS `Endpoint 2 ID`

By definition, a bridge is an edge, here incarnated by the relationship between (a) and (b): "(a:RESOURCE)-[r0]→(b:RESOURCE)". How did we eliminate edges between (a) and arbitrary node (b)s that were part of a cycle, and therefore, not bridges?

Since this is a small network with a maximum of one link per node, and GraphGists are artificially limited, we excluded paths in which (a) is connected to (b) between two or three hops of separation.

For example, a path such as (a)--(b)--()--(a), would be excluded, as (a) is 2 hops away from b.

Finding and Labeling a Subgraph

MATCH path = (a:RESOURCE { resource_id:1 })-[*1..3]-(c:RESOURCE)
WHERE ALL (n IN nodes(path) WHERE NOT n.resource_id=6)
FOREACH (n IN nodes(path)| SET n :subgraph)
RETURN path

Again, as this is a small network and we know two things: that the longest path in the subgraph in question is 3 hops, and that node 6 (the other end of the bridge connecting the subgraph) is not in the subgraph.

The query collects all paths between 1 and 3 hops away from resource 1, checks that all nodes in each path doesn’t have resource_id 6, and labels all nodes in the filtered paths with the label 'subgraph'.

Finding the Shortest Path

MATCH p = shortestPath((a:subgraph)-[r*..3]-(b:subgraph))
WHERE a.resource_id = 1 AND b.resource_id = 2 AND ALL (r1 IN relationships(p) WHERE r1.status='open')
FOREACH (x IN relationships(p) | SET x.status = 'closed')
RETURN relationships(p)

Although shortestPath seems self-explanatory, it is important to note that there may be more than one 'shortestPath' (for example, two paths of length 2 might be the shortest in the network), and the one returned by the stock query may not be the one you are interested in. Try using allShortestPaths instead.

References