Skip to content
This repository has been archived by the owner on Oct 3, 2023. It is now read-only.

Don't trace health endpoints #151

Open
rakyll opened this issue Jul 25, 2018 · 16 comments
Open

Don't trace health endpoints #151

rakyll opened this issue Jul 25, 2018 · 16 comments

Comments

@rakyll
Copy link
Contributor

rakyll commented Jul 25, 2018

Tracing canonical HTTP health endpoints such as /healthz and /_ah/health generate lots of traces. This results in additional high cost when collecting and storing spans for these endpoints, as well as noise at visualizing time.

HTTP tracing integrations should by default disable the tracing of:

  • /healthz
  • /_ah/health

Other canonical health endpoints can later be added to the list.

@semistrict
Copy link
Contributor

Is there any gRPC equivalent?

@rakyll
Copy link
Contributor Author

rakyll commented Jul 25, 2018

See gRPC's canonical health checking reference: https://github.com/grpc/grpc/blob/master/doc/health-checking.md.

@mtwo
Copy link
Contributor

mtwo commented Aug 13, 2018

Did we end up implementing this or is it still under discussion. If it's still under discussion, then I suspect that the simplest method of achieving this is through a blacklist that's maintained on GitHub and contained within builds of the library. If certain URLs / methods need to be excluded by everyone, they can be added in GitHub through a PR, and if users need to blacklist additional URLs / methods specific to their app, they can do do via an API or config.

Thoughts?

@semistrict
Copy link
Contributor

I don't like the idea of just blocking some arbitrary selection of paths. I think that if we had a better default sampler (per #156) users wouldn't resort to enabling 100% sampling, which is problematic for any production traffic - not just health checking.

@mtwo
Copy link
Contributor

mtwo commented Aug 14, 2018

I see your point, but feel like there's still a need to remove certain endpoints. For example, seeing traces of Profiler requests to Stackdriver drive me nuts: they're high latency and throw off my views, and I really don't care about their performance. We also can't expect the libraries that generate these requests to integrate with OpenCensus with the sole purpose of excluding themselves from sampling.

@semistrict
Copy link
Contributor

@mtwo

For example, seeing traces of Profiler requests to Stackdriver drive me nuts: they're high latency and throw off my views, and I really don't care about their performance.

This won't be helped by resolving this issue as proposed. This issue is about avoiding server spans related to health checking. What you're talking about are client spans (and unrelated to health checking). I think we should have a separate issue for that.

@mtwo
Copy link
Contributor

mtwo commented Aug 14, 2018

That's right, sorry! I'll create a separate issue

@SergeyKanzhelev
Copy link
Member

SergeyKanzhelev commented Sep 25, 2018

Should there be a common solution? One solution I can think of is to have a special tag for SDK internal processing code. So exporter may decide to throw away anything with that tag or attribute. Like "SyntheticSource". This pattern can be used later for calls from availability ping tests.

@rakyll
Copy link
Contributor Author

rakyll commented Sep 25, 2018

We are also considering a transport-based request influenced sampling policy setting. This allows us to implement filtering mechanisms by HTTP path, RPC name, etc.

The current HTTP-specific spec change PR: #182

@rakyll
Copy link
Contributor Author

rakyll commented Oct 8, 2018

I think we can close this issue by suggesting library implementations to provide filtering options based on #182.

@montanaflynn
Copy link

Is this closed or is there a way to filter traces by sampling options in Go? I couldn't find any.

We're using opencensus on AWS and the ELB healthchecks are causing a lot of traces. In addition they just go to / so having something configurable would be ideal, possibly even by user-agent since it's set to ELB-HealthChecker/2.0.

@montanaflynn
Copy link

montanaflynn commented Apr 3, 2019

Actually I just figured it out using GetStartOptions:

&ochttp.Handler{
	Handler: handler,
	GetStartOptions: func(r *http.Request) trace.StartOptions {
		startOptions := trace.StartOptions{}
		if r.UserAgent() == "ELB-HealthChecker/2.0" {
			startOptions.Sampler = trace.NeverSample()
		}
		return startOptions
	},
},

@hixichen
Copy link

hixichen commented Aug 13, 2019

Thanks to @montanaflynn

// SkipedSampleAPIs skip tracing sample data for these apis
var SkipedSampleAPIs = map[string]bool{
	"/readyz":   true,
	"/metricsz": true,
	"/healthz":  true,
}	

handler = &ochttp.Handler{
		Handler: handler,
		GetStartOptions: func(r *http.Request) trace.StartOptions {
			startOptions := trace.StartOptions{}
			if SkipedSampleAPIs[r.URL.Path] {
				startOptions.Sampler = trace.NeverSample()
			}
			return startOptions
		},
	}

@lunemec
Copy link

lunemec commented Jan 2, 2020

Sadly this won't help in gRPC server as the ocgrpc plugin does not allow for GetStartOptions function to handle per request tracing. 😞

@rhzs
Copy link

rhzs commented Sep 24, 2020

Now, it should be more easier and correct to skip private/health endpoints with IsHealthEndpoint callback. It will completely skip the trace.

// SkipedSampleAPIs skip tracing sample data for these apis
var SkipedSampleAPIs = map[string]bool{
	"/readyz":   true,
	"/metricsz": true,
	"/healthz":  true,
}	

handler = &ochttp.Handler{
		Handler: handler,
		IsHealthEndpoint: func(r *http.Request) bool {
			return skipedSampleAPIs[r.URL.Path]
		},
	}

@0x726d77
Copy link

It would still be great to have a standard option to exclude the grpc.health.* RPCs.

In the meantime, it is possible to use a custom sampler with ocgrpc, just a little more cumbersome than the IsHealthEndpoint option provided with ochttp.

sampler := func (fraction float64) trace.Sampler {
    ps := trace.ProbabilitySampler(fraction)
    return func(params trace.SamplingParameters) trace.SamplingDecision {
        if strings.HasPrefix(params.Name, "grpc.health") {
            return trace.SamplingDecision{Sample: false}
        }
        return ps(params)
    }
}(0.25) // <- sample rate from environment, config, etc.

options := trace.StartOptions{Sampler: sampler}
handler := ocgrpc.ServerHandler{StartOptions: options}
server  := grpc.NewServer(grpc.StatsHandler(&handler))

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants