Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Opentelemetry error during ert run #9949

Closed
xjules opened this issue Feb 3, 2025 · 4 comments · Fixed by #10004
Closed

Opentelemetry error during ert run #9949

xjules opened this issue Feb 3, 2025 · 4 comments · Fixed by #10004
Assignees
Labels

Comments

@xjules
Copy link
Contributor

xjules commented Feb 3, 2025

While running drogon_design.ert on the current testing release I've got the following error:

2025-02-03 08:42:14,400 [ERROR] azure.monitor.opentelemetry.exporter.export._base: Data drop 400: 103: Field 'time' on type 'Envelope' is older than the allowed min date. Expected: now - 172800000ms {'additional_properties': {}, 'version': 1, 'name': 'Microsoft.ApplicationInsights.RemoteDependency', 'time': '2025-01-29T15:02:12.872982Z', 'sample_rate': 100, 'sequence': None, 'instrumentation_key': , 'tags': {<ContextTagKeys.AI_DEVICE_ID: 'ai.device.id'>: '', <ContextTagKeys.AI_DEVICE_LOCALE: 'ai.device.locale'>: 'en_US', <ContextTagKeys.AI_DEVICE_OS_VERSION: 'ai.device.osVersion'>: '#1 SMP Mon Dec 16 04:25:43 EST 2024', <ContextTagKeys.AI_DEVICE_TYPE: 'ai.device.type'>: 'Other', <ContextTagKeys.AI_INTERNAL_SDK_VERSION: 'ai.internal.sdkVersion'>: 'ulm_py3.11.11:otel1.29.0:ext1.0.0b32', <ContextTagKeys.AI_CLOUD_ROLE: 'ai.cloud.role'>: 'ert', <ContextTagKeys.AI_CLOUD_ROLE_INSTANCE: 'ai.cloud.roleInstance'>: '', <ContextTagKeys.AI_INTERNAL_NODE_NAME: 'ai.internal.nodeName'>: '', <ContextTagKeys.AI_OPERATION_ID: 'ai.operation.id'>: '552c364b281f3cefacd445a49020ae6b', <ContextTagKeys.AI_OPERATION_PARENT_ID: 'ai.operation.parentId'>: 'b9d1094516f16d09'}, 'data': <azure.monitor.opentelemetry.exporter._generated.models._models_py3.MonitorBase object at>}.
@xjules xjules moved this to Todo in SCOUT Feb 3, 2025
@xjules xjules added the bug label Feb 3, 2025
@HakonSohoel HakonSohoel self-assigned this Feb 4, 2025
@HakonSohoel HakonSohoel moved this from Todo to In Progress in SCOUT Feb 4, 2025
@HakonSohoel
Copy link
Contributor

ApplicationInsights won't accept data with time stamp older than 48h. This message seems to have been triggered after dark storage was shut down after running for several days, in which case information about the dark storage span is rejected by ApplicationInsights since it's time stamp is too old.

@larsevj
Copy link
Contributor

larsevj commented Feb 4, 2025

I get a similar error message each time I run drogon

ERROR    Data drop 400: 100: Field 'message' on type 'MessageData' is too long. Expected: 32768 characters {'additional_properties': {}, 'version': 1, 'name': 'Microsoft.ApplicationInsights.Message', 'time': '2025-02-04T11:56:25.117106Z', 'sample_rate': 100, 'sequence': None, 'instrumentation_key': '', 'tags': {<ContextTagKeys.AI_DEVICE_ID: 'ai.device.id'>:, <ContextTagKeys.AI_DEVICE_LOCALE: 'ai.device.locale'>: 'en_US', <ContextTagKeys.AI_DEVICE_OS_VERSION: 'ai.device.osVersion'>: '#1 SMP Mon Dec 16 04:25:43 EST 2024', <ContextTagKeys.AI_DEVICE_TYPE: 'ai.device.type'>: 'Other', <ContextTagKeys.AI_INTERNAL_SDK_VERSION: 'ai.internal.sdkVersion'>: 'ulm_py3.11.11:otel1.29.0:ext1.0.0b32', <ContextTagKeys.AI_CLOUD_ROLE: 'ai.cloud.role'>: 'unknown_service', <ContextTagKeys.AI_CLOUD_ROLE_INSTANCE: 'ai.cloud.roleInstance'>: 'Stavanger', <ContextTagKeys.AI_INTERNAL_NODE_NAME: 'ai.internal.nodeName'>: , <ContextTagKeys.AI_OPERATION_ID: 'ai.operation.id'>: , <ContextTagKeys.AI_OPERATION_PARENT_ID: 'ai.operation.parentId'>: }, 'data': <azure.monitor.opentelemetry.exporter._generated.models._models_py3.MonitorBase object at 0x7f378c3274d0>}.

@HakonSohoel
Copy link
Contributor

HakonSohoel commented Feb 5, 2025

The 48h old data limit seems to be hardcoded in ApplicationInsights:

"...Application Insights may drop telemetry that [...] is too old (over 48 hours) or too new (newer than 2+ hours), or fails other validation issues."

Spans are always sent after they have ended. The AzureMonitorTraceExporter converts the spans to envelopes (TelemetryItem), and sets the time field of the envelope equal to the start time of the span. The time field of the envelope is used as the start time of dependencies/spans in azure and grafana.

In other words a span with a duration longer than 48h will be considered as too old when sent to azure and therefore dropped/rejected even though the span just recently ended.

Options:

  • Create support ticket towards azure to see if I've missed something and there is a solution to accepting spans longer than 48h after all <= ticket created, follow up in Handle logging of spans longer than 48 hours #10031
  • Redefine spans in ert in a way that ensures that they will never last more than 48h (<= would probably be quite messy)

@berland
Copy link
Contributor

berland commented Feb 5, 2025

This issue is resolved when this warning cannot any longer appear in the terminal (it is fine to have in the logs on disk).

@HakonSohoel HakonSohoel moved this from In Progress to Ready for Review in SCOUT Feb 6, 2025
@HakonSohoel HakonSohoel moved this from Ready for Review to In Progress in SCOUT Feb 6, 2025
@HakonSohoel HakonSohoel moved this from In Progress to Ready for Review in SCOUT Feb 6, 2025
@eivindjahren eivindjahren moved this from Ready for Review to Reviewed in SCOUT Feb 7, 2025
@github-project-automation github-project-automation bot moved this from Reviewed to Done in SCOUT Feb 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

4 participants