Skip to content
reguero edited this page Apr 3, 2022 · 7 revisions

DNS Load Balancing

The load balancing service dynamically handles the list of machines behind a given DNS alias to allow scaling and improve availability by allowing several nodes to be presented behind a single name. It is one of the technologies enabling deployment of large scale applications on cloud resources.

The Domain Name System (DNS) is a naming system for computers or other resources. DNS is essential for Internet. It is an internet-standard protocol that allows to use names instead of IP addresses. Load balancing is an advanced function that can be provided by DNS, to distribute requests across several machines running the same service by using the same DNS name.

The LBD DNS Load Balancer has been developed as a cost-effective way to handle applications accepting the DNS timing constraints and not requiring affinity (also known as persistence or sticky sessions). It is currently (March 2019) used by 761 services on the site with two small VMs acting as LBD master and slave. The alias member nodes have configured a Simple Network Management Protocol (SNMP) agent that communicates with the lbclient program. The proposal is to provide a load metric number used by the LBD server to determine the subset of nodes from the set whose IP is to be presented. Let see an overview of the service:

The LBD server periodically gets a load metric from the lbclient in the alias member nodes using SNMP and uses the information to update the A (IPV4) and AAAA (IPV6) records for a DNS delegated zone that corresponds to the alias using Dynamic DNS (see RFC2136). The period ("polling_interval") is 5 minutes by default.

The LBD slave does like the LBD master, i.e: periodically gets a load metric from the alias member nodes. However, it only updates the DNS delegated zone when it loses contact with the LBD master. This is verified by trying to get a file with a "heartbeat" from a web server on the LBD master.

The lbclient provides a built-in load metric. Alternative load metrics can be configured by combining several Collectd metrics and constants. Health monitoring checks can also be configured for the alias members to be taken out of the alias when certain condition is triggered. A typical example is the check of the Roger state so that the node is taken out when the appstate is not 'production'. As well as several built-in checks you may also configure additional ones using Collectd metrics. You can also use the return code of an arbitrary program (or script) as a check. If the node is in working state the load metric is an integer greater than 0. If the load metric is 0 or lower than 0, it means that the machine is not available.

We have produced a LBaaS interface with a self-service GUI to facilitate alias creation and management.

DNS Aliases that are not load balanced

Normal DNS aliases that are not balanced (CNAME) may be defined for each node in the LANDB GUI in https://network.cern.ch. Please note that normal DNS aliases are not part of the DNS Load Balancing service.

Please also note that there is a caveat for OpenStack VMS because the Cloud team has taken control of the LANDB entries for the VMs. So you should follow the instructions in the CERN Cloud documentation.

Clone this wiki locally