Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: polling health checker will always fail #69

Closed
humbornjo opened this issue Nov 15, 2024 · 3 comments · Fixed by #70
Closed

Bug: polling health checker will always fail #69

humbornjo opened this issue Nov 15, 2024 · 3 comments · Fixed by #70

Comments

@humbornjo
Copy link

We run into some trouble with client conn control, and trying to use the health check. however, the default health checker will always fail with its context deadline exceeded. I think its because the line 332 in httplb/balancer.go, should use withtimeout instead of withcancel.

connection := conns[i]
connCtx, connCancel := context.WithCancel(b.ctx)
healthChecker := b.healthChecker.New(connCtx, connection, b)
go func() {
	defer connCancel()
	if err := connection.Prewarm(connCtx); err == nil {
		b.warmedUp(connection)
	}
}()

here is my test code

// server.go
package main

import (
	"github.com/gofiber/fiber/v2"
	"github.com/gofiber/fiber/v2/middleware/healthcheck"
	"github.com/gofiber/fiber/v2/middleware/recover"
)

func main() {
	app := fiber.New()

	app.Use(recover.New())
	app.Use(healthcheck.New(
		healthcheck.Config{
			LivenessEndpoint: "/healthz",
			LivenessProbe: func(c *fiber.Ctx) bool {
				c.Send([]byte("OK"))
				return true
			}},
	))
	app.Get("/foo", func(c *fiber.Ctx) error {
		return c.Send([]byte("bar"))
	})
	app.Get("/", func(c *fiber.Ctx) error {
		return c.Send([]byte("bar"))
	})
	if err := app.Listen(":3000"); err != nil {
		panic(err)
	}
}

// client.go
package main

import (
	"fmt"
	"net/http"
	"net/url"
	"time"

	"github.com/bufbuild/httplb"
	"github.com/bufbuild/httplb/health"
)

func main() {

	url := url.URL{
		Scheme: "http",
		Host:   "localhost:3000",
		Path:   "foo",
	}

	client := httplb.NewClient(
		httplb.WithHealthChecks(
			health.NewPollingChecker(
				health.PollingCheckerConfig{
					Timeout:         100 * time.Second,
					PollingInterval: 100 * time.Second,
				},
				health.NewSimpleProber("/healthz"),
			),
		),
	)

	for range 5 {
		req, err := http.NewRequest(http.MethodGet, url.String(), http.NoBody)
		if err != nil {
			fmt.Println(err)
			continue
		}

		resp, err := client.Do(req)
		if err != nil {
			fmt.Println(err)
			continue
		}
		fmt.Println(resp.Status)
		time.Sleep(1 * time.Second)
	}
}

the error log

200 OK
Get "http://localhost:3000/foo": unavailable: no healthy connections
Get "http://localhost:3000/foo": unavailable: no healthy connections
Get "http://localhost:3000/foo": unavailable: no healthy connections
Get "http://localhost:3000/foo": unavailable: no healthy connections
@humbornjo humbornjo changed the title polling health checker will always fail Bug: polling health checker will always fail Nov 15, 2024
@humbornjo
Copy link
Author

btw, I wrote a dummy prober with no use of ctx, and it works well.

@humbornjo
Copy link
Author

Speaking of which, about the wired behavior health when the conn dialing timeout. it should report unhealthy, but in fact it still use the timeout conn. I am still trying to comprehend the code logic of this part, maybe discuss it in another thread.

@jhump
Copy link
Member

jhump commented Nov 18, 2024

Speaking of which, about the wired behavior health when the conn dialing timeout. it should report unhealthy, but in fact it still use the timeout conn.

This should not be the case. If dialing times out, it will ultimately cause the HTTP request to fail and set the state to unhealthy (link).

However, what you might be seeing is that the unhealthy connection is attempted for use until the health check times out. There are currently a few internal constants and TODOs to possibly make them configurable. But the logic is that it tries to create a pool with some minimum size. And if the set of known-healthy connections/backends isn't large enough, it will also include connections/backends whose health state is still unknown.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants