Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

evaluate telemetry library #1040

Closed
amoore877 opened this issue Jul 31, 2019 · 6 comments
Closed

evaluate telemetry library #1040

amoore877 opened this issue Jul 31, 2019 · 6 comments

Comments

@amoore877
Copy link
Member

Currently telemetry package used is https://github.com/armon/go-metrics, which provides telemetry via prometheus, statsd, dogstatsd, and inmem. I propose getting off of this library either entirely or partially in favor of another.

8 error(s) occurred:
* collected metric "spire_server_node_api_x509_svid_fetch_elapsed_time" { label:<name:"host" value:"someObscuredHost" > label:<name:"spiffe_id" value:"spiffe://someObscuredSpiffe" > summary:<sample_count:2 sample_sum:11515.1171875 quantile:<quantile:0.5 value:nan > quantile:<quantile:0.9 value:nan > quantile:<quantile:0.99 value:nan > > } was collected before with the same name and label values

Overall it may be more effective to transition to a different library.

@amoore877 amoore877 changed the title change telemetry library used evaluate telemetry library Jul 31, 2019
@amoore877
Copy link
Member Author

amoore877 commented Jul 31, 2019

I propose the alternative of https://github.com/uber-go/tally, which provides prometheus, m3, and statsd implementations.
Additionally:

  • It is active
  • It is an Uber-supported project
  • It has actual release cycles
  • It it generally more configurable than the current library

@amoore877
Copy link
Member Author

As a side issue, it is inconvenient to have go-metrics be a forced common middleware for all metrics plugins. It is understandable since the existing built-in metrics plugins are all also from go-metrics, but this also forces all plugins into go-metrics logic. For instance, MeasureSince is actually a go-metrics wrapper around the plugins' AddSample, which is deceptive since we'd expect when bringing in a plugin that MeasureSince calls should be going to the plugin's MeasureSince.

@evan2645
Copy link
Member

I'm not quite sure what you mean by metrics plugins.

The interface that the telemetry package exposes is modeled after the go-metrics interface because that was the simplest thing to do and it seemed likely that such an interface could map easily onto other libraries in the even that we decided to switch it out. It was specifically meant to act as a layer of indirection over whatever metrics library we want to use

@amoore877
Copy link
Member Author

I'm not quite sure what you mean by metrics plugins.

I am referring to prometheus, statsd, etc... + any future things that may be added. Sure they aren't really plugins, but I feel like they are nearly there so I frequently accidentally refer to them like that.
^this does also bring up the tangent that it would be nice if telemetry followed plugin structure, specifically so consumers could use their own external plugins.

It was specifically meant to act as a layer of indirection over whatever metrics library we want to use

That's not quite all it is doing though. Instead it is also making decisions on when APIs actually get called and with what precise data, units, etc. For instance:

func (m *Metrics) MeasureSinceWithLabels(key []string, start time.Time, labels []Label) {
	if m.HostName != "" && m.EnableHostnameLabel {
		labels = append(labels, Label{"host", m.HostName})
	}
	if m.EnableTypePrefix {
		key = insert(0, "timer", key)
	}
	if m.ServiceName != "" {
		if m.EnableServiceLabel {
			labels = append(labels, Label{"service", m.ServiceName})
		} else {
			key = insert(0, m.ServiceName, key)
		}
	}
	allowed, labelsFiltered := m.allowMetric(key, labels)
	if !allowed {
		return
	}
	now := time.Now()
	elapsed := now.Sub(start)
	msec := float32(elapsed.Nanoseconds()) / float32(m.TimerGranularity)
	m.sink.AddSampleWithLabels(key, msec, labelsFiltered)
}

there is a great deal of configuration hidden away that is making decisions for every metrics sink. IMO whatever indirection layer we have should solely be calling the individual sinks and giving them the required data, not performing other operations that should be on the implementation's side. Related would be the idea of when multiple metrics implementations are in use, and perhaps each one would like a separate config.

@evan2645
Copy link
Member

The sink logic is fairly specific to go-metrics, I don't know if we'd even keep any of it if we move away. The interface I was referring to is our own telemetry.Metrics. So long as that interface is stable then swapping out shouldn't be too involved.

The MeasureSince method on that interface is commented:

// A convenience function for measuring elapsed time with a single line

While go-metrics does this for us today, moving away from go-metrics might mean taking some of that logic in and doing it ourselves.

I don't think we should look at the sinks as anything permanent. The fact that they are tightly coupled to go-metrics is an implementation detail internal to the telemetry package, and they are tightly coupled because they are themselves an artifact of how go-metrics works.

@rturner3
Copy link
Collaborator

rturner3 commented May 9, 2022

We have not heard any significant issues related to our metrics framework, and this issue hasn't had any activity for over two years. Closing due to staleness.

@rturner3 rturner3 closed this as completed May 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants