Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for python instrumentation #69

Merged
merged 7 commits into from
Feb 1, 2024
Merged

Add support for python instrumentation #69

merged 7 commits into from
Feb 1, 2024

Conversation

lisguo
Copy link
Contributor

@lisguo lisguo commented Jan 30, 2024

Description of changes:

  • Adds support for python instrumentation in the amazon-cloudwatch-agent-operator
  • Uses the upstream otel python sdk: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-python:0.43b0
  • Mutates a pod with the annotation instrumentation.opentelemetry.io/inject-python: true:
Namespace:        default
Priority:         0
Service Account:  default
Node:             ip-192-168-52-224.ec2.internal/192.168.52.224
Start Time:       Tue, 30 Jan 2024 16:54:35 -0500
Labels:           app=hello-python
                  pod-template-hash=7974765f9c
Annotations:      instrumentation.opentelemetry.io/inject-python: true
Status:           Running
IP:               192.168.59.186
IPs:
  IP:           192.168.59.186
Controlled By:  ReplicaSet/hello-python-7974765f9c
Init Containers:
  opentelemetry-auto-instrumentation-python:
    Container ID:  containerd://e8de796162b43a9fb94b392590de7e2d5883e09a4e1ef3540b0ace0dc4301aff
    Image:         ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-python:0.43b0
    Image ID:      ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-python@sha256:0477dbde4f5d462ac6d21a8721a9867e82d2bc5fb3088cbcc02ca93c6cd54798
    Port:          <none>
    Host Port:     <none>
    Command:
      cp
      -a
      /autoinstrumentation/.
      /otel-auto-instrumentation-python
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Tue, 30 Jan 2024 16:54:35 -0500
      Finished:     Tue, 30 Jan 2024 16:54:36 -0500
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /otel-auto-instrumentation-python from opentelemetry-auto-instrumentation-python (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-9qq9x (ro)
Containers:
  hello-python:
    Container ID:   containerd://abd2e46abcbca8c02e8a84c633d2b5babfe1c04d0cde399bce0edebb940b2a25
    Image:          355660395157.dkr.ecr.us-east-1.amazonaws.com/hello-python:latest
    Image ID:       355660395157.dkr.ecr.us-east-1.amazonaws.com/hello-python@sha256:482587ea288da5e1fbdffb097920f65c1320a5670740af8d2aaf0964a38aca94
    Port:           5000/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Tue, 30 Jan 2024 16:54:37 -0500
    Ready:          True
    Restart Count:  0
    Environment:
      OTEL_SMP_ENABLED:                     true
      OTEL_TRACES_SAMPLER_ARG:              endpoint=http://cloudwatch-agent.amazon-cloudwatch:2000
      OTEL_EXPORTER_OTLP_PROTOCOL:          http/protobuf
      OTEL_EXPORTER_OTLP_TRACES_ENDPOINT:   http://cloudwatch-agent.amazon-cloudwatch:4316/v1/traces
      OTEL_AWS_SMP_EXPORTER_ENDPOINT:       http://cloudwatch-agent.amazon-cloudwatch:4315
      OTEL_METRICS_EXPORTER:                none
      OTEL_PYTHON_DISTRO:                   aws-distro
      OTEL_PYTHON_CONFIGURATOR:             aws-configurator
      PYTHONPATH:                           /otel-auto-instrumentation-python/opentelemetry/instrumentation/auto_instrumentation:/otel-auto-instrumentation-python
      OTEL_TRACES_EXPORTER:                 otlp
      OTEL_EXPORTER_OTLP_TRACES_PROTOCOL:   http/protobuf
      OTEL_EXPORTER_OTLP_METRICS_PROTOCOL:  http/protobuf
      OTEL_SERVICE_NAME:                    hello-python
      OTEL_RESOURCE_ATTRIBUTES_POD_NAME:    hello-python-7974765f9c-m7vcv (v1:metadata.name)
      OTEL_RESOURCE_ATTRIBUTES_NODE_NAME:    (v1:spec.nodeName)
      OTEL_PROPAGATORS:                     tracecontext,baggage,b3,xray
      OTEL_RESOURCE_ATTRIBUTES:             k8s.container.name=hello-python,k8s.deployment.name=hello-python,k8s.namespace.name=default,k8s.node.name=$(OTEL_RESOURCE_ATTRIBUTES_NODE_NAME),k8s.pod.name=$(OTEL_RESOURCE_ATTRIBUTES_POD_NAME),k8s.replicaset.name=hello-python-7974765f9c,service.version=latest
    Mounts:
      /otel-auto-instrumentation-python from opentelemetry-auto-instrumentation-python (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-9qq9x (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  kube-api-access-9qq9x:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
  opentelemetry-auto-instrumentation-python:
    Type:        EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:   200Mi
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  55s   default-scheduler  Successfully assigned default/hello-python-7974765f9c-m7vcv to ip-192-168-52-224.ec2.internal
  Normal  Pulled     55s   kubelet            Container image "ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-python:0.43b0" already present on machine
  Normal  Created    55s   kubelet            Created container opentelemetry-auto-instrumentation-python
  Normal  Started    55s   kubelet            Started container opentelemetry-auto-instrumentation-python
  Normal  Pulling    53s   kubelet            Pulling image "355660395157.dkr.ecr.us-east-1.amazonaws.com/hello-python:latest"
  Normal  Pulled     53s   kubelet            Successfully pulled image "355660395157.dkr.ecr.us-east-1.amazonaws.com/hello-python:latest" in 149ms (149ms including waiting)
  Normal  Created    53s   kubelet            Created container hello-python
  Normal  Started    53s   kubelet            Started container hello-python

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

jefchien
jefchien previously approved these changes Jan 31, 2024
@@ -47,6 +47,9 @@ manager:
java:
repository: public.ecr.aws/aws-observability/adot-autoinstrumentation-java
tag: v1.31.1
python:
repository: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-python
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we're using GHCR instead of ECR? Is it not available yet?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is temporary until the ADOT python image is available.

main.go Outdated
Comment on lines 94 to 96
autoInstrumentationPython string
autoAnnotationConfigStr string
webhookPort int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Is this just a weird formatting issue? Did you run make fmt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is what i get for resolving merge conflicts from the github ui sigh. I knew I should've used cli

defaultAPIVersion = "cloudwatch.aws.amazon.com/v1alpha1"
defaultInstrumenation = "java-instrumentation"
defaultNamespace = "default"
defaultKind = "Instrumentation"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this have any code change? Or we are just reformatting the code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just reformatting. I decided to group different attributes by different use cases

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does default instrumentation and config now mean when we have multiple instrumentations? I don't think OTel operator has a default instrumentation. It will inject whatever instrumentation is defined via the annotation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the Instrumentation CR defines multiple languages. The default instrumentation will mutate pods even if the customer has no Instrumentation CR on their cluster.

This file has the definition of what the environment variables we will inject when pods are annotated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default instrumentation will mutate pods even if the customer has no Instrumentation CR on their cluster.

Still not clear why this is needed to be done.

@@ -32,13 +33,23 @@ const (
otelExporterSmpEndpointDefaultValue = "http://cloudwatch-agent.amazon-cloudwatch:4315"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use OTEL_EXPORTER_OTLP_METRICS_PROTOCOL: http/protobuf, do we need metrics endpoint to be something like: http://cloudwatch-agent.amazon-cloudwatch:4316/v1/metrics?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure, I left these variables to be the same as java for now at least to be backwards compatible.

We can always revisit this variables in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only care about the OTEL_AWS_SMP_EXPORTER_ENDPOINT for appsignals metrics. If the endpoint says port 4315 (which it will for EKS) then the SMP exporter will be a grpc exporter.

@@ -32,13 +33,23 @@ const (
otelExporterSmpEndpointDefaultValue = "http://cloudwatch-agent.amazon-cloudwatch:4315"
otelExporterMetricKey = "OTEL_METRICS_EXPORTER"
otelExporterMetricDefaultValue = "none"

otelPythonDistro = "OTEL_PYTHON_DISTRO"
otelPythonDistroDefaultValue = "aws-distro"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be "aws_distro"

otelPythonDistro = "OTEL_PYTHON_DISTRO"
otelPythonDistroDefaultValue = "aws-distro"
otelPythonConfigurator = "OTEL_PYTHON_CONFIGURATOR"
otelPythonConfiguratorDefaultValue = "aws-configurator"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be "aws_configurator"

Copy link
Contributor

@srprash srprash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly went through the configuration stuff and LGTM, except aws_distro and aws_configurator changes. TY

defaultNamespace = "default"
defaultKind = "Instrumentation"
defaultAPIVersion = "cloudwatch.aws.amazon.com/v1alpha1"
defaultInstrumenation = "java-instrumentation"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to look very weird. A follow up to unravel the implications of changing this for existing customers would be good so we atleast understand.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we never actually create this resource on the cluster.. its just referenced in memory.
So it might be ok to rename this.

We need to test out the scenario to be sure... but this is non-blocking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean the naming of the instrumentation cr itself? yeah because this is no longer just java.

Copy link
Contributor

@sky333999 sky333999 Feb 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup thats right since this is no longer just java and currently we call it java-instrumentation. No one will ever see this since its not actually created on the cluster and only referenced below by the operator.. so not a blocker.

defaultInstrumenation = "java-instrumentation"
defaultNamespace = "default"
defaultKind = "Instrumentation"
defaultAPIVersion = "cloudwatch.aws.amazon.com/v1alpha1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally find it hard to navigate these consts back & forth between the usages below and the values here when trying to see what the instrumentation CR actually looks like.
If these consts are not used in more than one place, I find inlining them to be more readable.. but might just be me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I was debating on that too -- the only thing is that we do re-use the same env vars for each lang..maybe having an env map would be better

@lisguo lisguo merged commit e8186e5 into main Feb 1, 2024
4 checks passed
@mitali-salvi mitali-salvi deleted the python branch June 25, 2024 18:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants