Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix varnish monitoring 2.0 #1035

Draft
wants to merge 1 commit into
base: fc-24.05-dev
Choose a base branch
from

Conversation

PhilTaken
Copy link
Member

the default monitoring check relied on varnish to run on a specific address.
since this address can be changed, the sensu check needs to be adjusted to correctly infer the host and port that varnish is running on

This pr is based on #996 but fixes an issue where an incorrect address was inferred when multiple addresses were passed to varnish.

FC-36891

@flyingcircusio/release-managers

Release process

Impact:

Changelog:

PR release workflow (internal)

  • PR has internal ticket
  • internal issue ID (PL-…) part of branch name
  • internal issue ID mentioned in PR description text
  • ticket is on Platform agile board
  • ticket state set to Pull request ready
  • if ticket is more urgent than within the next few days, directly contact a member of the Platform team

Design notes

  • Provide a feature toggle if the change might need to be adjusted/reverted quickly depending on context. Consider whether the default should be on or off. Example: rate limiting.
  • All customer-facing features and (NixOS) options need to be discoverable from documentation. Add or update relevant documentation such that hosted and guided customers can understand it as well.

Security implications

  • Security requirements defined? (WHERE)
    • the varnish monitors the correct address when not using the default
  • Security requirements tested? (EVIDENCE)

case where a single address is set:

phil@test01 ~ $ sensu-client-show-config | jq '.checks.varnish_http'
{
  "command": "check_http -H 172.22.57.135 -p 8080 -c 10 -w 3 -t 20 -e HTTP",
  "interval": 60,
  "notification": "varnish port 8080 HTTP response",
  "standalone": true,
  "timeout": null,
  "ttl": null,
  "warnIsCritical": false
}

phil@test01 ~ $ cat /etc/local/nixos/varnish_http.nix
{ lib, ... }:
{
  services.varnish.http_address = lib.mkForce "172.22.57.135:8080";
}

phil@test01 ~ $

case where multiple addresses are set:

phil@test01 ~ $ sensu-client-show-config | jq '.checks.varnish_http'
{
  "command": "check_http -H 127.0.0.1 -p 8070 -c 10 -w 3 -t 20 -e HTTP",
  "interval": 60,
  "notification": "varnish port 8070 HTTP response",
  "standalone": true,
  "timeout": null,
  "ttl": null,
  "warnIsCritical": false
}

phil@test01 ~ $ cat /etc/local/nixos/varnish_http.nix
{ lib, ... }:
{
  services.varnish.http_address = lib.mkForce "*:8070 -a 127.0.0.1:8081";
}

phil@test01 ~ $

and the default when no address is set:

phil@test01 ~ $ sudo nix repl
[sudo] password for phil:
Welcome to Nix 2.18.1. Type :? for help.

nix-repl> :l <nixpkgs/nixos>
Added 6 variables.

nix-repl> config.services.varnish.http_address
"172.22.57.135:8008 -a [2a02:248:101:63::19db]:8008 -a 127.0.0.1:8008 -a [::1]:8008"

nix-repl>

phil@test01 ~ $ cat /etc/local/nixos/varnish_http.nix
{ lib, ... }:
{
}

phil@test01 ~ $ sensu-client-show-config | jq '.checks.varnish_http'
{
  "command": "check_http -H 172.22.57.135 -p 8008 -c 10 -w 3 -t 20 -e HTTP",
  "interval": 60,
  "notification": "varnish port 8008 HTTP response",
  "standalone": true,
  "timeout": null,
  "ttl": null,
  "warnIsCritical": false
}

phil@test01 ~ $

@dpausp dpausp changed the base branch from fc-23.11-dev to fc-24.05-dev June 28, 2024 12:02
@dpausp dpausp changed the title [FC-36891] fix varnish monitoring 2.0 Fix varnish monitoring 2.0 Jun 28, 2024
Copy link
Member

@osnyx osnyx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the code in general looks plausible, there are some general questions and decisions we need to take before investigating the details makes sense.

Happy to discuss this with you if needed or I misunderstood something.

@@ -76,9 +76,18 @@ in {
http_address = mkOption {
type = types.str;
default = "*:8008";
description = ''
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a proper option description is good. We might want to document the approach of how to add multiple listen addresses here as well.

nixos/roles/webproxy.nix Show resolved Hide resolved
command = "check_http -H localhost -p 8008 -c 10 -w 3 -t 20 -e HTTP";
varnish_http = let
# the plattform module passes multiple addresses in the form of '<address>:<port> -a <address>:<port> -a <address>:<port>'
first_address = builtins.elemAt (lib.splitString " " cfg.http_address) 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question here is what we want to achieve with our monitoring:

  • monitor the varnish server process and whether it works -> it is sufficient to reach varnish via any of its multiple listen addresses, in this implementation we always take the first one
  • monitor the listen address configuration as well: is varnish actually reachable via all configured IP-port configurations? Did we mis-type something, did one of the IPs somehow change, etc…
    • in that case, we need to run the http check against all the configured listen addresses, not just the first one.

the default monitoring check relied on varnish to run on a specific
address.
since this address can be changed, the sensu check needs to be adjusted
to correctly infer the host and port that varnish is running on
@osnyx osnyx force-pushed the phil/FC-36891_varnish-monitoring branch from 5fa6407 to af4bccc Compare July 30, 2024 11:37
@osnyx
Copy link
Member

osnyx commented Jul 30, 2024

rebased onto latest dev branch to fix a test failure in docs.

@osnyx
Copy link
Member

osnyx commented Jul 30, 2024

We also noticed that the docs regarding monitoring are outdated. It'd be really neat if you updated that part to reflect the current state. Can be a brief overview of what monitoring is happening, no need for deep details.

https://doc.flyingcircus.io/roles/fc-24.05-production/webproxy.html

@ctheune ctheune marked this pull request as draft August 14, 2024 09:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants