Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poc cost attribution #9392

Draft
wants to merge 13 commits into
base: main
Choose a base branch
from
Draft

Poc cost attribution #9392

wants to merge 13 commits into from

Conversation

ying-jeanne
Copy link
Contributor

@ying-jeanne ying-jeanne commented Sep 24, 2024

What this PR does

The prototype to support adding custom label to track active series/ samples received / discarded samples related to this ticket #5698.

The cost attribution related metrics would be exported through a different endpoint.

Which issue(s) this PR fixes or relates to

Fixes #

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
  • about-versioning.md updated with experimental features.

pkg/ingester/ingester.go Outdated Show resolved Hide resolved
pkg/ingester/ingester.go Outdated Show resolved Hide resolved
pkg/ingester/ingester.go Outdated Show resolved Hide resolved
pkg/ingester/ingester.go Outdated Show resolved Hide resolved
@bboreham
Copy link
Contributor

Glad to see this make some progress. Ideally I would like to see this kind of implementation take over from usage groups, in simple cases. So it would be good to have the label names, etc., line up.

pkg/mimir/mimir.go Outdated Show resolved Hide resolved
@@ -303,7 +303,7 @@ func newIngesterMetrics(
activeSeriesPerUser: promauto.With(activeSeriesReg).NewGaugeVec(prometheus.GaugeOpts{
Name: "cortex_ingester_active_series",
Help: "Number of currently active series per user.",
}, []string{"user"}),
}, []string{"user", "attrib"}),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To follow the pattern set by GEL, we would create a new metric instead of adding to the existing one.

  1. This new metric would be exposed on a separate endpoint apart from /metrics called /usage_metrics
  2. The label name should be dynamic and match the configured name, instead of using a constant attrib name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is fixed in current PR.

@ying-jeanne ying-jeanne force-pushed the poc-cost-attribution branch 2 times, most recently from 9320256 to 7e628c3 Compare September 26, 2024 14:20
@ying-jeanne ying-jeanne force-pushed the poc-cost-attribution branch 2 times, most recently from 9e4a0b3 to 503cd2d Compare October 5, 2024 20:40
Copy link
Contributor

@colega colega left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a quick pass, left some comments, didn't finish reviewing everything because I have two big concerns right now:

  • The current implementation will use quite a lot of memory resources in active series tracker, I left appropriate comments on that.
  • I don't think that the current implementation will handle correctly a configuration change on the cost attribution label and back, i.e.:
    • Current label is a, series is created.
    • Label is changed to b, counters in the series stripes are removed
    • Label is changed back to a
    • Active series are purged, since label is the same as when it was created, the counters which are already zero are decremented and are going negative.

@@ -0,0 +1,158 @@
package caimpl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this subpackage?

Copy link
Contributor Author

@ying-jeanne ying-jeanne Oct 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to isolate the interface and implementations, all function needs to be called out side of costattribution would present in the interface file manager.go and depends on that package only.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally find this approach strange and unnecessary. Why do we need to isolate that? We don't do that anywhere.

I would move everything into the costattribution package.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to avoid import cycle, some projects do this, and easy mocks with interface. 👍 since we don't use this in Mimir would remove it to cost attribution as requested.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest building the mocks in a different package, so production code just needs its production package, but test packages would import the mock packages. If you use mockery to generate them, you can use --outpkg and --output flags for that.

Otherwise you'd need to put your mocks in the production interface package, and import two packages from your prod package.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have a preference in Mimir? I will follow our guideline. In grafana/grafana people are against mockery and prefer manual fake.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we've used mockery so far, but I'm not telling you to use mockery, the tool you use for the mock is orthogonal to the mock placement.

pkg/distributor/distributor.go Outdated Show resolved Hide resolved
pkg/costattribution/caimpl/managerImpl.go Outdated Show resolved Hide resolved
@@ -73,23 +77,38 @@ type seriesStripe struct {
activeMatchingNativeHistograms []uint32 // Number of active entries (only native histograms) in this stripe matching each matcher of the configured Matchers.
activeNativeHistogramBuckets uint32 // Number of buckets in active native histogram entries in this stripe. Only decreased during purge or clear.
activeMatchingNativeHistogramBuckets []uint32 // Number of buckets in active native histogram entries in this stripe matching each matcher of the configured Matchers.
userID string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid copying the user in all stripes?

pkg/ingester/activeseries/active_series.go Outdated Show resolved Hide resolved
@@ -212,6 +231,20 @@ func (c *ActiveSeries) ActiveWithMatchers() (total int, totalMatching []int, tot
return
}

func (c *ActiveSeries) ActiveByAttributionValue(calb string) map[string]uint32 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func (c *ActiveSeries) ActiveByAttributionValue(calb string) map[string]uint32 {
func (c *ActiveSeries) ActiveByAttributionValue(label string) map[string]uint32 {

pkg/ingester/activeseries/active_series.go Outdated Show resolved Hide resolved
@@ -456,6 +514,8 @@ func (s *seriesStripe) purge(keepUntil time.Time) {
s.deleted.purge(ref)
}
delete(s.refs, ref)
// here need to find what is deleted and decrement counters
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like a TODO, please use a TODO comment.

pkg/costattribution/caimpl/tracker.go Outdated Show resolved Hide resolved
pkg/costattribution/caimpl/tracker_group.go Outdated Show resolved Hide resolved
@ying-jeanne ying-jeanne force-pushed the poc-cost-attribution branch 2 times, most recently from 844c20f to 2736755 Compare October 16, 2024 16:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants