An application clustering mechanism

This is a drop-in module that can be used to make any Python 3.7+ application aware of its multiple concurrently running instances.

The clustering works by:

Sending UDP heartbeats between application nodes, from a background thread
Running a maintenance background thread that periodically checks the health of the entire cluster
Using a central storage (shared filesystem, databases, etc) to record which nodes are currently part of the cluster
Electing one node as "primary"
Providing callback functions when a node becomes a primary, or loses its primary status or when a node is instructed to shut down by the cluster

View documentation

Install the pdoc package and run pdoc to view HTML-formatted API documentation in your browser:

~$ pip install pdoc
~$ pdoc -d google cluster.py

Basic usage

From each application instance, create a cluster control instance:

import cluster

def become_primary():
    """Function called when this node becomes primary."""
def leave_primary():
    """Function called when this node stops being primary."""
def forced_exit():
    """Function called when cluster wants this node to stop."""

# path to a network share that every node can access
shared_backend = cluster.SharedFsStorageBackend("/shared/storage/clustering")

cluster_ctrl = cluster.ClusterControl(
    cluster_name="sample_cluster",  # name of cluster that everyone joins
    host_name="0.0.0.0",  # host address that every node can reach
    node_name="node01",  # our unique name
    hb_port=44330,  # UDP port for our heartbeat
    hb_interval=5,  # interval between sending heartbeats
    hb_timeout=10,  # seconds until a heartbeat from a node is considered as missed
    hb_missed_count=2,  # remove nodes after 2 misses (a bit low for production)
    own_version="1.0",  # node will exit if any node has a higher version
    check_interval=2,  # how often cluster health should be rechecked
    start_primary_callback=become_primary,
    stop_primary_callback=leave_primary,
    exit_callback=forced_exit,
    storage_backend=shared_backend)

From main thread of the application, let own node join the cluster
```
cluster.join_cluster(cluster_ctrl)
```
From main thread of the application, start the cluster control
```
cluster_ctrl.start()
```

When appplication stops, let cluster control finish all background activities and then gracefully exit the cluster:

cluster_ctrl.stop()
cluster_ctrl.join(10)  # should be at least higher than the HB check interval
cluster.leave_cluster(cluster_ctrl)

For more detailed usage instructions, refer to the examples/ folder.

Split-brain scenario handling

Rudimentary support for handling node isolation is available:

When UDP packet network becomes entirely unavailable, but access to shared storage is available, all nodes will briefly drop out and re-join as-is. Whoever was primary will remain primary.
When primary node becomes isolated from the rest, but shared storage is still available, other nodes will remove the primary and elect a new primary. The old primary will automatically re-join and stop its primary functions.
When shared storage becomes unavailable, the cluster state will freeze as it was and no updates to cluster will be possible. A manual restart of nodes is required in order to restore clustering.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
examples		examples
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cluster.py		cluster.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

An application clustering mechanism

View documentation

Basic usage

Split-brain scenario handling

About

Releases

Packages

Languages

License

mensonen/clustering

Folders and files

Latest commit

History

Repository files navigation

An application clustering mechanism

View documentation

Basic usage

Split-brain scenario handling

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages