Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(diagnostics) add diff endpoints #6131

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
9 changes: 9 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,15 @@ Adding a new version? You'll need three changes:
- `KongCustomEntity` is now included in last valid configuration retrieved from
Kong gateways.
[#6305](https://github.com/Kong/kubernetes-ingress-controller/pull/6305)
- Added `/debug/config/diff-report` diagnostic endpoint. This endpoint is
available in DB mode when the `--dump-config` and `--dump-sensitive-config`
are enabled. It returns the latest diff information for the controller's last
configuration sync along with config hash and sync timestamp metadata. The
controller maintains the last 5 diffs in cache. You can retrieve older diffs
by appending a `?hash=<hash>` query string argument. Available config hashes
and their timestamps are listed under the `available` section of the
response.
[#6131](https://github.com/Kong/kubernetes-ingress-controller/pull/6131)

### Fixed

Expand Down
3 changes: 2 additions & 1 deletion go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -27,10 +27,11 @@ require (
github.com/go-logr/logr v1.4.2
github.com/go-logr/zapr v1.3.0
github.com/goccy/go-json v0.10.3
github.com/golang-collections/collections v0.0.0-20130729185459-604e922904d3
github.com/google/go-cmp v0.6.0
github.com/google/uuid v1.6.0
github.com/jpillora/backoff v1.0.0
github.com/kong/go-database-reconciler v1.14.2
github.com/kong/go-database-reconciler v1.14.3
github.com/kong/go-kong v0.57.0
github.com/kong/kubernetes-telemetry v0.1.4
github.com/kong/kubernetes-testing-framework v0.47.1
Expand Down
6 changes: 4 additions & 2 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,8 @@ github.com/goccy/go-json v0.10.3 h1:KZ5WoDbxAIgm2HNbYckL0se1fHD6rz5j4ywS6ebzDqA=
github.com/goccy/go-json v0.10.3/go.mod h1:oq7eo15ShAhp70Anwd5lgX2pLfOS3QCiwU/PULtXL6M=
github.com/gogo/protobuf v1.3.2 h1:Ov1cvc58UF3b5XjBnZv7+opcTcQFZebYjWzi34vdm4Q=
github.com/gogo/protobuf v1.3.2/go.mod h1:P1XiOD3dCwIKUDQYPy72D8LYyHL2YPYrpS2s69NZV8Q=
github.com/golang-collections/collections v0.0.0-20130729185459-604e922904d3 h1:zN2lZNZRflqFyxVaTIU61KNKQ9C0055u9CAfpmqUvo4=
github.com/golang-collections/collections v0.0.0-20130729185459-604e922904d3/go.mod h1:nPpo7qLxd6XL3hWJG/O60sR8ZKfMCiIoNap5GvD12KU=
github.com/golang/glog v0.0.0-20160126235308-23def4e6c14b/go.mod h1:SBH7ygxi8pfUlaOkMMuAQtPIUF8ecWP5IEl/CR7VP2Q=
github.com/golang/groupcache v0.0.0-20200121045136-8c9f03a8e57e/go.mod h1:cIg4eruTrX1D+g88fzRXU5OdNfaM+9IcxsU14FzY7Hc=
github.com/golang/groupcache v0.0.0-20210331224755-41bb18bfe9da h1:oI5xCqsCo564l8iNU+DwB5epxmsaqB+rhGL0m5jtYqE=
Expand Down Expand Up @@ -238,8 +240,8 @@ github.com/kisielk/errcheck v1.5.0/go.mod h1:pFxgyoBC7bSaBwPgfKdkLd5X25qrDl4LWUI
github.com/kisielk/gotool v1.0.0/go.mod h1:XhKaO+MFFWcvkIS/tQcRk01m1F5IRFswLeQ+oQHNcck=
github.com/klauspost/compress v1.17.0 h1:Rnbp4K9EjcDuVuHtd0dgA4qNuv9yKDYKK1ulpJwgrqM=
github.com/klauspost/compress v1.17.0/go.mod h1:ntbaceVETuRiXiv4DpjP66DpAtAGkEQskQzEyD//IeE=
github.com/kong/go-database-reconciler v1.14.2 h1:AWS8FjO+qhpjZ0hn26pdSNNwweDFa1pOkiQCL36/KXY=
github.com/kong/go-database-reconciler v1.14.2/go.mod h1:OsyUWH3SUvXAd5HJx3PhRU3PyzH0uCy7bTlzoSfw8Ew=
github.com/kong/go-database-reconciler v1.14.3 h1:Vmw5NNePkr2mc9r+As87dPRIsrOduM4b0zpPi9ci1mA=
github.com/kong/go-database-reconciler v1.14.3/go.mod h1:OsyUWH3SUvXAd5HJx3PhRU3PyzH0uCy7bTlzoSfw8Ew=
github.com/kong/go-kong v0.57.0 h1:e+4bHTzcO0xhFPVyGIUtlO+B4E4l14k55NH8Vjw6ORY=
github.com/kong/go-kong v0.57.0/go.mod h1:gyNwyP1fzztT6sX/0/ygMQ30OiRMIQ51b2jSfstMrcU=
github.com/kong/kubernetes-telemetry v0.1.4 h1:Yz7OlECxWKgNRG1wJ5imA4+H0dQEpdU9d86uhwUVpu4=
Expand Down
9 changes: 5 additions & 4 deletions internal/dataplane/kong_client.go
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,7 @@ type KongClient struct {

// diagnostic is the client and configuration for reporting diagnostic
// information during data-plane update runtime.
diagnostic diagnostics.ConfigDumpDiagnostic
diagnostic diagnostics.ClientDiagnostic

// prometheusMetrics is the client for shipping metrics information
// updates to the prometheus exporter.
Expand Down Expand Up @@ -186,7 +186,7 @@ type KongClient struct {
func NewKongClient(
logger logr.Logger,
timeout time.Duration,
diagnostic diagnostics.ConfigDumpDiagnostic,
diagnostic diagnostics.ClientDiagnostic,
kongConfig sendconfig.Config,
eventRecorder record.EventRecorder,
dbMode dpconf.DBMode,
Expand Down Expand Up @@ -825,6 +825,7 @@ func (c *KongClient) sendToClient(
c.prometheusMetrics,
c.updateStrategyResolver,
c.configChangeDetector,
&c.diagnostic,
isFallback,
)
// Only record events on applying configuration to Kong gateway here.
Expand Down Expand Up @@ -882,13 +883,13 @@ type sendDiagnosticFn func(meta diagnostics.DumpMeta, raw []byte)
func prepareSendDiagnosticFn(
ctx context.Context,
logger logr.Logger,
diagnosticConfig diagnostics.ConfigDumpDiagnostic,
diagnosticConfig diagnostics.ClientDiagnostic,
targetState *kongstate.KongState,
targetContent *file.Content,
deckGenParams deckgen.GenerateDeckContentParams,
isFallback bool,
) sendDiagnosticFn {
if diagnosticConfig == (diagnostics.ConfigDumpDiagnostic{}) {
if diagnosticConfig == (diagnostics.ClientDiagnostic{}) {
// noop, diagnostics won't be sent
return func(diagnostics.DumpMeta, []byte) {}
}
Expand Down
2 changes: 1 addition & 1 deletion internal/dataplane/kong_client_golden_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -299,7 +299,7 @@ func runKongClientGoldenTest(t *testing.T, tc kongClientGoldenTestCase) {
kongClient, err := NewKongClient(
logger,
timeout,
diagnostics.ConfigDumpDiagnostic{},
diagnostics.ClientDiagnostic{},
cfg,
mocks.NewEventRecorder(),
dpconf.DBModeOff, // Test will run in DB-less mode only for now. In the future, we may want to test DB mode as well.
Expand Down
19 changes: 11 additions & 8 deletions internal/dataplane/kong_client_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -189,7 +189,10 @@ func newMockUpdateStrategyResolver(t *testing.T) *mockUpdateStrategyResolver {
}
}

func (f *mockUpdateStrategyResolver) ResolveUpdateStrategy(c sendconfig.UpdateClient) sendconfig.UpdateStrategy {
func (f *mockUpdateStrategyResolver) ResolveUpdateStrategy(
c sendconfig.UpdateClient,
_ *diagnostics.ClientDiagnostic,
) sendconfig.UpdateStrategy {
f.lock.Lock()
defer f.lock.Unlock()

Expand Down Expand Up @@ -952,7 +955,7 @@ func setupTestKongClient(
) *KongClient {
logger := zapr.NewLogger(zap.NewNop())
timeout := time.Second
diagnostic := diagnostics.ConfigDumpDiagnostic{}
diagnostic := diagnostics.ClientDiagnostic{}
config := sendconfig.Config{
SanitizeKonnectConfigDumps: true,
}
Expand Down Expand Up @@ -1247,7 +1250,7 @@ func TestKongClient_FallbackConfiguration_SuccessfulRecovery(t *testing.T) {
kongClient, err := NewKongClient(
zapr.NewLogger(zap.NewNop()),
time.Second,
diagnostics.ConfigDumpDiagnostic{
diagnostics.ClientDiagnostic{
Configs: diagnosticsCh,
},
sendconfig.Config{
Expand Down Expand Up @@ -1384,7 +1387,7 @@ func TestKongClient_FallbackConfiguration_SkipsUpdateWhenInSync(t *testing.T) {
kongClient, err := NewKongClient(
zapr.NewLogger(zap.NewNop()),
time.Second,
diagnostics.ConfigDumpDiagnostic{
diagnostics.ClientDiagnostic{
Configs: diagnosticsCh,
},
sendconfig.Config{
Expand Down Expand Up @@ -1533,7 +1536,7 @@ func TestKongClient_FallbackConfiguration_FailedRecovery(t *testing.T) {
kongClient, err := NewKongClient(
zapr.NewLogger(zap.NewNop()),
time.Second,
diagnostics.ConfigDumpDiagnostic{
diagnostics.ClientDiagnostic{
Configs: diagnosticsCh,
},
sendconfig.Config{
Expand Down Expand Up @@ -1646,7 +1649,7 @@ func TestKongClient_LastValidCacheSnapshot(t *testing.T) {
kongClient, err := NewKongClient(
zapr.NewLogger(zap.NewNop()),
time.Second,
diagnostics.ConfigDumpDiagnostic{},
diagnostics.ClientDiagnostic{},
sendconfig.Config{
FallbackConfiguration: tc.fallbackConfigurationFeatureEnabled,
UseLastValidConfigForFallback: tc.useLastValidConfigForFallbackEnabled,
Expand Down Expand Up @@ -1734,7 +1737,7 @@ func TestKongClient_ConfigDumpSanitization(t *testing.T) {
for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
diagnosticsCh := make(chan diagnostics.ConfigDump, 1) // make it buffered to avoid blocking
kongClient.diagnostic = diagnostics.ConfigDumpDiagnostic{
kongClient.diagnostic = diagnostics.ClientDiagnostic{
Configs: diagnosticsCh,
DumpsIncludeSensitive: tc.dumpsIncludeSensitive,
}
Expand Down Expand Up @@ -1869,7 +1872,7 @@ func TestKongClient_RecoveringFromGatewaySyncError(t *testing.T) {
kongClient, err := NewKongClient(
zapr.NewLogger(zap.NewNop()),
time.Second,
diagnostics.ConfigDumpDiagnostic{},
diagnostics.ClientDiagnostic{},
sendconfig.Config{
FallbackConfiguration: true,
},
Expand Down
35 changes: 32 additions & 3 deletions internal/dataplane/sendconfig/dbmode.go
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ import (
"fmt"
"reflect"
"sync"
"time"

"github.com/blang/semver/v4"
"github.com/go-logr/logr"
Expand All @@ -17,6 +18,7 @@ import (
"github.com/kong/go-kong/kong"

"github.com/kong/kubernetes-ingress-controller/v3/internal/dataplane/deckerrors"
"github.com/kong/kubernetes-ingress-controller/v3/internal/diagnostics"
"github.com/kong/kubernetes-ingress-controller/v3/internal/logging"
"github.com/kong/kubernetes-ingress-controller/v3/internal/metrics"
)
Expand All @@ -28,6 +30,7 @@ type UpdateStrategyDBMode struct {
dumpConfig dump.Config
version semver.Version
concurrency int
diagnostic *diagnostics.ClientDiagnostic
isKonnect bool
logger logr.Logger
resourceErrors []ResourceError
Expand All @@ -39,13 +42,15 @@ func NewUpdateStrategyDBMode(
dumpConfig dump.Config,
version semver.Version,
concurrency int,
diagnostic *diagnostics.ClientDiagnostic,
logger logr.Logger,
) UpdateStrategyDBMode {
return UpdateStrategyDBMode{
client: client,
dumpConfig: dumpConfig,
version: version,
concurrency: concurrency,
diagnostic: diagnostic,
logger: logger,
resourceErrors: []ResourceError{},
resourceErrorLock: &sync.Mutex{},
Expand All @@ -57,9 +62,10 @@ func NewUpdateStrategyDBModeKonnect(
dumpConfig dump.Config,
version semver.Version,
concurrency int,
diagnostic *diagnostics.ClientDiagnostic,
logger logr.Logger,
) UpdateStrategyDBMode {
s := NewUpdateStrategyDBMode(client, dumpConfig, version, concurrency, logger)
s := NewUpdateStrategyDBMode(client, dumpConfig, version, concurrency, diagnostic, logger)
s.isKonnect = true
return s
}
Expand Down Expand Up @@ -89,7 +95,7 @@ func (s UpdateStrategyDBMode) Update(ctx context.Context, targetContent ContentW
}

ctx, cancel := context.WithCancel(ctx)
go s.HandleEvents(ctx, syncer.GetResultChan())
go s.HandleEvents(ctx, syncer.GetResultChan(), s.diagnostic, fmt.Sprintf("%x", targetContent.Hash))

_, errs, _ := syncer.Solve(ctx, s.concurrency, false, false)
cancel()
Expand Down Expand Up @@ -117,13 +123,31 @@ func (s UpdateStrategyDBMode) Update(ctx context.Context, targetContent ContentW

// HandleEvents handles logging and error reporting for individual entity change events generated during a sync by
// looping over an event channel. It terminates when its context dies.
func (s *UpdateStrategyDBMode) HandleEvents(ctx context.Context, events chan diff.EntityAction) {
func (s *UpdateStrategyDBMode) HandleEvents(
ctx context.Context,
events chan diff.EntityAction,
diagnostic *diagnostics.ClientDiagnostic,
hash string,
) {
s.resourceErrorLock.Lock()
diff := diagnostics.ConfigDiff{
Hash: hash,
Entities: []diagnostics.EntityDiff{},
}
for {
select {
case event := <-events:
if event.Error == nil {
// TODO https://github.com/Kong/go-database-reconciler/issues/120
// GDR can sometimes send phantom events with no content whatsoever. This is a bug, but its cause is
// unclear. Ideally this is fixed in GDR and those events never get sent here, but as a workaround we can just
// discard anything that has no Action value as garbage, to avoid it showing up in the report endpoint.
if event.Action == "" {
continue
}
s.logger.V(logging.DebugLevel).Info("updated gateway entity", "action", event.Action, "kind", event.Entity.Kind, "name", event.Entity.Name)
eventDiff := diagnostics.NewEntityDiff(event.Diff, string(event.Action), event.Entity)
diff.Entities = append(diff.Entities, eventDiff)
} else {
s.logger.Error(event.Error, "failed updating gateway entity", "action", event.Action, "kind", event.Entity.Kind, "name", event.Entity.Name)
parsed, err := resourceErrorFromEntityAction(event)
Expand All @@ -134,6 +158,11 @@ func (s *UpdateStrategyDBMode) HandleEvents(ctx context.Context, events chan dif
}
}
case <-ctx.Done():
if diagnostic != nil {
diff.Timestamp = time.Now().Format(time.RFC3339)
diagnostic.Diffs <- diff
s.logger.V(logging.DebugLevel).Info("recorded database update events and diff", "hash", hash)
}
s.resourceErrorLock.Unlock()
return
}
Expand Down
6 changes: 4 additions & 2 deletions internal/dataplane/sendconfig/sendconfig.go
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ import (
"github.com/kong/go-kong/kong"

"github.com/kong/kubernetes-ingress-controller/v3/internal/dataplane/deckgen"
"github.com/kong/kubernetes-ingress-controller/v3/internal/diagnostics"
"github.com/kong/kubernetes-ingress-controller/v3/internal/logging"
"github.com/kong/kubernetes-ingress-controller/v3/internal/metrics"
"github.com/kong/kubernetes-ingress-controller/v3/internal/store"
Expand All @@ -22,7 +23,7 @@ import (
// -----------------------------------------------------------------------------

type UpdateStrategyResolver interface {
ResolveUpdateStrategy(client UpdateClient) UpdateStrategy
ResolveUpdateStrategy(client UpdateClient, diagnostic *diagnostics.ClientDiagnostic) UpdateStrategy
}

type AdminAPIClient interface {
Expand All @@ -48,6 +49,7 @@ func PerformUpdate(
promMetrics *metrics.CtrlFuncMetrics,
updateStrategyResolver UpdateStrategyResolver,
configChangeDetector ConfigurationChangeDetector,
diagnostic *diagnostics.ClientDiagnostic,
isFallback bool,
) ([]byte, error) {
oldSHA := client.LastConfigSHA()
Expand All @@ -72,7 +74,7 @@ func PerformUpdate(
}
}

updateStrategy := updateStrategyResolver.ResolveUpdateStrategy(client)
updateStrategy := updateStrategyResolver.ResolveUpdateStrategy(client, diagnostic)
logger = logger.WithValues("update_strategy", updateStrategy.Type())
timeStart := time.Now()
err = updateStrategy.Update(ctx, ContentWithHash{
Expand Down
24 changes: 22 additions & 2 deletions internal/dataplane/sendconfig/strategy.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ import (
"github.com/kong/go-kong/kong/custom"

"github.com/kong/kubernetes-ingress-controller/v3/internal/adminapi"
"github.com/kong/kubernetes-ingress-controller/v3/internal/diagnostics"
"github.com/kong/kubernetes-ingress-controller/v3/internal/metrics"
)

Expand Down Expand Up @@ -77,8 +78,9 @@ func NewDefaultUpdateStrategyResolver(config Config, logger logr.Logger) Default
// with the backoff strategy it provides.
func (r DefaultUpdateStrategyResolver) ResolveUpdateStrategy(
client UpdateClient,
diagnostic *diagnostics.ClientDiagnostic,
) UpdateStrategy {
updateStrategy := r.resolveUpdateStrategy(client)
updateStrategy := r.resolveUpdateStrategy(client, diagnostic)

if clientWithBackoff, ok := client.(UpdateClientWithBackoff); ok {
return NewUpdateStrategyWithBackoff(updateStrategy, clientWithBackoff.BackoffStrategy(), r.logger)
Expand All @@ -87,7 +89,10 @@ func (r DefaultUpdateStrategyResolver) ResolveUpdateStrategy(
return updateStrategy
}

func (r DefaultUpdateStrategyResolver) resolveUpdateStrategy(client UpdateClient) UpdateStrategy {
func (r DefaultUpdateStrategyResolver) resolveUpdateStrategy(
client UpdateClient,
diagnostic *diagnostics.ClientDiagnostic,
) UpdateStrategy {
adminAPIClient := client.AdminAPIClient()

// In case the client communicates with Konnect Admin API, we know it has to use DB-mode. There's no need to check
Expand All @@ -100,6 +105,20 @@ func (r DefaultUpdateStrategyResolver) resolveUpdateStrategy(client UpdateClient
},
r.config.Version,
r.config.Concurrency,
// The DB mode update strategy is used for both DB mode gateways and Konnect-integrated controllers. In the
// Konnect case, we don't actually want to collect diffs, and don't actually provide a diagnostic when setting
// it up, so we only collect and send diffs if we're talking to a gateway.
//
// TODO maybe this is wrong? I'm not sure if we actually support (or if not, explicitly prohibit)
// configuring a controller to use both DB mode and talk to Konnect, or if we only support DB-less when using
// Konnect. If those are mutually exclusive, maybe we can just collect diffs for Konnect mode? If they're
// not mutually exclusive, trying to do diagnostics diff updates for both the updates would have both attempt
// to store diffs. This is... maybe okay. They should be identical, but that's a load-bearing "should": we know
// Konnect can sometimes differ in what it accepts versus the gateway, and we have some Konnect configuration
// (consumer exclude, sensitive value mask) where they're _definitely_ different. That same configuration could
// make the diff confusing even if it's DB mode only, since it doesn't reflect what we're sending to the gateway
// in some cases.
nil,
r.logger,
)
}
Expand All @@ -114,6 +133,7 @@ func (r DefaultUpdateStrategyResolver) resolveUpdateStrategy(client UpdateClient
},
r.config.Version,
r.config.Concurrency,
diagnostic,
r.logger,
)
}
Expand Down
2 changes: 1 addition & 1 deletion internal/dataplane/sendconfig/strategy_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ func TestDefaultUpdateStrategyResolver_ResolveUpdateStrategy(t *testing.T) {
InMemory: tc.inMemory,
}, zapr.NewLogger(zap.NewNop()))

strategy := resolver.ResolveUpdateStrategy(updateClient)
strategy := resolver.ResolveUpdateStrategy(updateClient, nil)
require.Equal(t, tc.expectedStrategyType, strategy.Type())
assert.True(t, client.adminAPIClientWasCalled)
assert.Equal(t, tc.expectKonnectControlPlaneCall, client.konnectControlPlaneWasCalled)
Expand Down
Loading
Loading