Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Enhancing observability customisations #1462

Open
apodgorbunschih opened this issue Dec 11, 2024 · 7 comments
Open

[FEATURE] Enhancing observability customisations #1462

apodgorbunschih opened this issue Dec 11, 2024 · 7 comments
Labels
enhancement New feature or request

Comments

@apodgorbunschih
Copy link

Requirements

Overview

Consider implementing options to include extra tags and metadata for the "traces/metrics/logs." This enhancement could provide valuable insights in the production environment.

Current situation

Metrics

Documentation

The metrics that are exposed, are lacking information for "production" environment.
Ex:
http.server.duration metric, measures the duration of inbound HTTP requests.

core/pkg/telemetry/metrics.go:169 Creation of the meter

...
	// we can ignore errors from OpenTelemetry since they could occur if we select the wrong aggregator
	hduration, _ := meter.Float64Histogram(
		httpRequestDurationMetric,
		metric.WithDescription("Measures the duration of inbound HTTP requests."),
		metric.WithUnit("s"),
	)
...

file: core/pkg/telemetry/metrics.go:80 Usage of the meter

...
func (r MetricsRecorder) HTTPRequestDuration(ctx context.Context, duration time.Duration, attrs []attribute.KeyValue) {
	r.httpRequestDurHistogram.Record(ctx, duration.Seconds(), metric.WithAttributes(attrs...))
}
...

The attributes that are registered, are hardcoded and can be enriched just on collector level(This is not something that we want to do, because this collector is collecting information from multiple sources).

Example of a metric in Prometheus

{exported_job="flagd", http_method="POST", http_status_code="200", http_url="/flagd.evaluation.v1.Service/ResolveBoolean", instance="otel-collector:8889", job="otel-collector", label1="value1", service_name="flagd"}

ex: This is how it looks in an environment in DataDog.
Image

Requirements

Be able to configure the "log/metrics/traces" with additional metadata/tags to enrich the observability. We would like to add tags like "application/region/zone" etc so we can have a better understanding of our system.

@apodgorbunschih apodgorbunschih added enhancement New feature or request Needs Triage This issue needs to be investigated by a maintainer labels Dec 11, 2024
@beeme1mr
Copy link
Member

Hey @apodgorbunschih, perhaps we could support custom resources attributes via the OTEL_RESOURCE_ATTRIBUTES environment variable. Would this address your concern?

It may also be a good opportunity to update all telemetry-related environment variables to comply with OpenTemetry standards. Here's a related issue for that.

@apodgorbunschih
Copy link
Author

apodgorbunschih commented Dec 11, 2024

@beeme1mr Michael, thanks again for such a quick replay ❤

Could you please help me to understand, by supporting this attribute in flagd, i would be able to configure the resources per flagd container or per collector ? The reason that i am asking, is that I have a collector in a k8s cluster, and in the cluster I have multiple applications(with flagd as a sidecar) that can send data to the collector, and i would like the configuration to be on flagd level not on the collector level 🙏

@beeme1mr
Copy link
Member

Hey @apodgorbunschih, looking a bit deeper into this, I'm wondering if this may work without requiring a code change 🤔. It appears that the OTel Go SDK will merge the default resources with those defined in the OTEL_RESOURCE_ATTRIBUTES environment variable. I THINK the resources should be applied as attributes.

@apodgorbunschih
Copy link
Author

Hey @beeme1mr 👋 , yeap yeap that would be good. I tried to pass the OTEL_RESOURCE_ATTRIBUTES and i see the change in traces 👍
Image
However in Prometheus, I dont have the additional tags
Image

I am using official documentation for the local configuration : https://flagd.dev/reference/monitoring/?h=moni#docker-composeyaml

@beeme1mr
Copy link
Member

We may have to explicitly pass additional dimensions to the metrics. Is this something you'd be interested in working on?

@beeme1mr beeme1mr removed the Needs Triage This issue needs to be investigated by a maintainer label Dec 12, 2024
@apodgorbunschih
Copy link
Author

apodgorbunschih commented Dec 13, 2024

I may try @beeme1mr . Do you have some specific requirements ? From my side what I would expect or want to have something like that :

flagd start [flags]

Observability options:
--metrics-metadata string               JSON representation of an array of key/value
--traces-metadata string                JSON representation of an array of key/value
--logs-metadata string                  JSON representation of an array of key/value

Current options:
  -X, --context-value stringToString    add arbitrary key value pairs to the flag evaluation context (default [])
  -C, --cors-origin strings             CORS allowed origins, * will allow all origins
  -h, --help                            help for start
  -z, --log-format string               Set the logging format, e.g. console or json (default "console")
  -m, --management-port int32           Port for management operations (default 8014)
  -t, --metrics-exporter string         Set the metrics exporter. Default(if unset) is Prometheus. Can be override to otel - OpenTelemetry metric exporter. Overriding to otel require otelCollectorURI to be present
  -r, --ofrep-port int32                ofrep service port (default 8016)
  -A, --otel-ca-path string             tls certificate authority path to use with OpenTelemetry collector
  -D, --otel-cert-path string           tls certificate path to use with OpenTelemetry collector
  -o, --otel-collector-uri string       Set the grpc URI of the OpenTelemetry collector for flagd runtime. If unset, the collector setup will be ignored and traces will not be exported.
  -K, --otel-key-path string            tls key path to use with OpenTelemetry collector
  -I, --otel-reload-interval duration   how long between reloading the otel tls certificate from disk (default 1h0m0s)
  -p, --port int32                      Port to listen on (default 8013)
  -c, --server-cert-path string         Server side tls certificate path
  -k, --server-key-path string          Server side tls key path
  -d, --socket-path string              Flagd socket path. With grpc the service will become available on this address. With http(s) the grpc-gateway proxy will use this address internally.
  -s, --sources string                  JSON representation of an array of SourceConfig objects. This object contains 2 required fields, uri (string) and provider (string). Documentation for this object: https://flagd.dev/reference/sync-configuration/#source-configuration
  -g, --sync-port int32                 gRPC Sync port (default 8015)
  -f, --uri .yaml/.yml/.json            Set a sync provider uri to read data from, this can be a filepath, URL (HTTP and gRPC), FeatureFlag custom resource, or GCS or Azure Blob. When flag keys are duplicated across multiple providers the merge priority follows the index of the flag arguments, as such flags from the uri at index 0 take the lowest precedence, with duplicated keys being overwritten by those from the uri at index 1. Please note that if you are using filepath, flagd only supports files with .yaml/.yml/.json extension.

But not sure if that fits in the current vision of the project and the direction

@beeme1mr
Copy link
Member

Hey @apodgorbunschih, I talked to one of the mainters of OpenTelemetry and they think OTEL_RESOURCE_ATTRIBUTES is still the way to go if the attributes are static. The reason you didn't see it in your test was that you need to explicitly enable resources as labels in your Opentelemetry collector config.

https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/prometheusexporter

The easiest way to do this is to enable resource_to_telemetry_conversion like this:

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
exporters:
  prometheus:
    endpoint: "0.0.0.0:8889"
    // This is the new setting
    resource_to_telemetry_conversion:
      enabled: true
  otlp/jaeger:
    endpoint: jaeger-all-in-one:4317
    tls:
      insecure: true
processors:
  batch:
service:
  pipelines:
    traces:
      receivers: [ otlp ]
      processors: [ batch ]
      exporters: [ otlp/jaeger ]
    metrics:
      receivers: [ otlp ]
      processors: [ batch ]
      exporters: [ prometheus ]

However, this may not be a good idea if there are a large number of resources. The recommended approach is to apply a processor that extracts specific resource attributes as Prometheus labels.

processor:
  transform:
    metric_statements:
      - context: datapoint
        statements:
        - set(attributes["namespace"], resource.attributes["k8s.namespace.name"])
        - set(attributes["container"], resource.attributes["k8s.container.name"])
        - set(attributes["pod"], resource.attributes["k8s.pod.name"])

You can read more about it here.

Hope this helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants