-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[V2] Enable metrics server #706
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## v2 #706 +/- ##
==========================================
+ Coverage 49.57% 50.22% +0.65%
==========================================
Files 27 27
Lines 2911 2919 +8
==========================================
+ Hits 1443 1466 +23
+ Misses 1290 1272 -18
- Partials 178 181 +3 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really cool.
The "filter" seems to be a perfect compliment to our plan for authenticating the Automation RPC endpoint. It imports the "apiserver" library and that will probably be useful for that effort.
@@ -29,6 +29,29 @@ require ( | |||
sigs.k8s.io/controller-runtime v0.18.4 | |||
) | |||
|
|||
require ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit -
There should only be two require
blocks (for direct and indirect) but I'm guessing go is getting confused by the replace directive? I wonder if go mod tidy
consolidates these after the replace
is moved to the top or bottom of the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the go.mod
file by manually deleting the indirect
block and letting it regenerate.
flag.StringVar(&probeAddr, "health-probe-bind-address", ":8081", "The address the probe endpoint binds to.") | ||
flag.BoolVar(&enableLeaderElection, "leader-elect", false, | ||
"Enable leader election for controller manager. "+ | ||
"Enabling this will ensure there is only one active controller manager.") | ||
flag.BoolVar(&secureMetrics, "metrics-secure", false, | ||
"If set the metrics endpoint is served securely") | ||
flag.BoolVar(&secureMetrics, "metrics-secure", true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -43,7 +43,6 @@ spec: | |||
args: | |||
- --leader-elect | |||
- --health-probe-bind-address=:8081 | |||
- --metrics-bind-address=:8383 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's an argument for keeping the port as-is considering the v1 chart spits out a ServiceMonitor that targets :8383 instead of the port name. It would be nice for the user's metrics to continue to Just Work™ -- although if we're changing up the keys significantly then maybe there's no point.
In any case we should probably try to keep the metrics service as close as possible to what we have in v1, to at least keep selectors the same.
Maybe unrelated - it seems like the v1 operator has two metrics ports, 8383 and 8686? Do you know why that is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PKO v1 was scaffolded with operator-sdk
pre v1, which did not use controller-runtime
/kubebuilder
under the hood. It seems like the older version of the framework was modelled using kube-state-metrics
(where 2 ports were exposed, one for the operator, and another for metrics on the CRs itself, ref). Kubebuilder uses a more simplified approach which exposes a globabl prometheus registry that we can add additional metrics to.
With the implementation of metrics being consolidated into 1 port, and industry standards moving to secure metrics by default, now seemed like a good time to make this change. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah totally agree one port is the most reasonable. AFAICT the other port wasn't actually used in v1. I'm still wondering if it's OK to change the port, though. Existing service monitors would stop scraping metrics, right?
return collectors | ||
} | ||
|
||
func init() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🙌 🙌 🙌
. "github.com/onsi/ginkgo/v2" | ||
. "github.com/onsi/gomega" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't like the linting sacrifice we take on by using these libraries. Especially in this case I think it makes things harder to read and more complicated than necessary. You're free to take this or leave it!
package pulumi
import (
"testing"
"github.com/prometheus/client_golang/prometheus/testutil"
"github.com/stretchr/testify/assert"
)
func TestProgramMetrics(t *testing.T) {
numPrograms.Set(0)
// it should increment the programs_active metric when a new Program is created
newProgramCallback(nil)
assert.Equal(t, 1.0, testutil.ToFloat64(numPrograms))
// it should increment the programs_active metric when another Program is created
newProgramCallback(nil)
assert.Equal(t, 2.0, testutil.ToFloat64(numPrograms))
// it should decrement the programs_active metric when a Program is deleted
deleteProgramCallback(nil)
assert.Equal(t, 1.0, testutil.ToFloat64(numPrograms))
// it should decrement the programs_active metric when another Program is deleted
deleteProgramCallback(nil)
assert.Equal(t, 0.0, testutil.ToFloat64(numPrograms))
// it should not decrement the programs_active metric when a Program is deleted and the metric is already at 0
deleteProgramCallback(nil)
assert.Equal(t, 0.0, testutil.ToFloat64(numPrograms))
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this, and in fact, I originally wrote this test using std Golang. However, I switched to using ginkgo ultimately, for consistency with other tests that we already have that test metrics. Also, this appeared to be more of a BDD style test, which is why I ultimately aligned with using Ginkgo.
Since we're already taking linting sacrifices in other files due to ginkgo, I'll leave this as is since I believe consistency here is more important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edit: whoops, that comment was in response to the metrics port.
This commit exposes the metrics server of the global prometheus registry. The default port is now set to :8443, with secureMetrics now defaulting to `true`.
4b44748
to
3554cd5
Compare
Proposed changes
This PR exposes the metrics service in our manifests and adds new unit tests to ensure the metrics logic is correct.
Technical changes
sigs.k8s.io/controller-runtime/pkg/metrics/filters
dependency to allow metrics to be secure by default (guidance from kubebuilder book/scaffold)updateStackCallback
functionstacks_reconciling
metricTesting
Note, e2e tests will be added in a followup PR.
Related issues (optional)
Closes: #576