Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document connectivity check UDFs in 11.0 #1025

Open
hanefi opened this issue Dec 16, 2021 · 1 comment
Open

Document connectivity check UDFs in 11.0 #1025

hanefi opened this issue Dec 16, 2021 · 1 comment
Assignees
Milestone

Comments

@hanefi
Copy link
Member

hanefi commented Dec 16, 2021

Why are we implementing it? (sales eng)

  1. Improved supportability.

Citus MX and (in general Citus) requires the nodes to be able to connect to each other. Citus now provides a simple UDF that provides the connectivity information across the whole cluster.

  1. Possible improvements in data planes

We can check for connectivity issues regularly in our managed database offerings. This can help detect network issues before they are reported by the customer via a ticket.

What are the typical use cases?

  • Routine controls in Control planes
  • Database Operators working on troubleshooting in a cluster

Communication goals (e.g. detailed howto vs orientation)

Good locations for content in docs structure

  • A new section under Useful Diagnostic Queries
  • Cluster Management -> Security -> Connection management
    We can have a note section here indicating that the users can use the new UDF to check for broken ssl setups, or other connection management configuration problems.

How does this work? (devs)

citus_check_cluster_node_health() returns setof records with names and ports for the source and target nodes, and the connectivity check result.

There are 3 possible result column values:

  • true : connection attempt from source to target succeeded.
  • false : connection attempt from source to target failed.
  • NULL : connection attempt from the current node to source node failed.

Example sql

This UDF can return many rows on large clusters. We suggest operators to run the following query that returns a single boolean first. If this one is true, all the checks were successful. Otherwise, the operator should look into the connectivity report on all the worker pairs.

SELECT bool_and(coalesce(result, false)) FROM citus_check_cluster_node_health();
 bool_and
---------------------------------------------------------------------
 f
(1 row)

For a quick summary, one can also use the following query to get the distinct counts of connectivity results in the cluster.

SELECT result, count(*) FROM citus_check_cluster_node_health() GROUP BY result ORDER BY 1;
 result | count
---------------------------------------------------------------------
 t      |     2
        |     2
(2 rows)

To check all the results for the connectivity checks, run the following:

SELECT * FROM citus_check_cluster_node_health() ORDER BY 1,2,3,4;
 from_nodename | from_nodeport | to_nodename | to_nodeport | result
---------------------------------------------------------------------
 localhost     |         57637 | localhost   |       57637 | t
 localhost     |         57637 | localhost   |       57638 | t
 localhost     |         57638 | localhost   |       57637 | 
 localhost     |         57638 | localhost   |       57638 | 

Corner cases, gotchas

It is risky to use the following query to check for an aggregate summary:
SELECT bool_and(result) FROM citus_check_cluster_node_health();
because bool_and aggregate returns true even if some of the rows were NULL. Using coalesce(result, false) is crucial here.

Are there relevant blog posts or outside documentation about the concept/feature?

Link to relevant commits and regression tests if applicable

citusdata/citus#5509

@hanefi hanefi added this to the 11.0 Release milestone Dec 16, 2021
@jonels-msft
Copy link
Member

#1038 includes docs for citus_check_cluster_node_health() itself, but we're still lacking content in useful diagnostic queries and the cluster management page like you suggsted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants