Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

messaging: add celery to messaging system values #1954

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

xrmx
Copy link
Contributor

@xrmx xrmx commented Feb 28, 2025

Changes

Celery site here:
https://docs.celeryq.dev/en/stable/index.html

And we have an instrumentation for celery in opentelemetry-python-contrib.

Merge requirement checklist

@xrmx xrmx requested review from a team as code owners February 28, 2025 09:51
Copy link
Contributor

@lmolkova lmolkova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Celery seems to be a background task processing library rather than a messaging system. Messaging semantic conventions might not really apply to it since they are focused on common messaging scenarios involving network calls. I would imaging celery integrations for specific systems (like RabbitMQ) to either leverage underlying instrumentation for this library or, if celery needs to be instrumented, it should set the messaging system to the actual broker configured by the user.

@xrmx
Copy link
Contributor Author

xrmx commented Mar 4, 2025

Celery seems to be a background task processing library rather than a messaging system. Messaging semantic conventions might not really apply to it since they are focused on common messaging scenarios involving network calls. I would imaging celery integrations for specific systems (like RabbitMQ) to either leverage underlying instrumentation for this library or, if celery needs to be instrumented, it should set the messaging system to the actual broker configured by the user.

We already have a celery instrumentation but we were not setting the messaging.system attribute, see open-telemetry/opentelemetry-python-contrib#3265 . The original PR had queue as value which does not seem much informative.
We may be able to get the broker from the client, a popular broker for celery is redis, will it fit in that list?

@lmolkova
Copy link
Contributor

lmolkova commented Mar 5, 2025

We already have a celery instrumentation but we were not setting the messaging.system attribute, see open-telemetry/opentelemetry-python-contrib#3265 . The original PR had queue as value which does not seem much informative. We may be able to get the broker from the client, a popular broker for celery is redis, will it fit in that list?

yep, redis would fit.
If you have strong reasons to believe we need celery to be in the messaging.system, e.g. if there are some celery-specific behaviors, or the underlying broker is not known, please let me know. I'm happy to change my opinion

@shivanshuraj1333
Copy link
Member

shivanshuraj1333 commented Mar 5, 2025

I believe that since users typically configure the underlying broker (e.g., Redis, RabbitMQ) through Celery, the Python auto-instrumentation should extract as much relevant information as possible from the Celery instrumentation.

Currently, however, certain details, such as the broker URL are missing from the context.

Ideally, if the broker URL can be identified, it should be used to populate the messaging.system attribute. In scenarios where only the task processing library is instrumented (as opposed to, say, a Kafka or RMQ client), it might be more appropriate to set messaging.system to the task processing library. This approach helps avoid incomplete instrumentation. Also from the end user perspective, the broker URL is just for queuing the tasks.

If the sem conv says, that messaging.system attribute is reserved for brokers, we may need to think of some another attribute which helps in identifying the processing library.

PTAL at open-telemetry/opentelemetry-python-contrib#3265 (comment) I'm not sure if there's an actual way to extract broker info from the context.

@lmolkova
Copy link
Contributor

lmolkova commented Mar 5, 2025

@shivanshuraj1333 @xrmx I replied in the https://github.com/open-telemetry/opentelemetry-python-contrib/pull/3265/files#r1981937391 - looking into existing instrumentation I don't believe it has a significant intersection with messaging semconv - it only populates message id, conversation id and destination name from messaging, but populates a lot of celery-specific attributes.

I don't see how setting messaging.system would be helpful - backends will assume that spans follow messaging semconv, but these spans would lack the details.

I'm suggesting to think about generic task processing (which maybe in-process) provided by celery as a layer on top of messaging semconv.

@shivanshuraj1333
Copy link
Member

what could be the attribute name for it? any suggestions?

@lmolkova
Copy link
Contributor

lmolkova commented Mar 5, 2025

created #1963 - we should document the limitations of messaging semconv.

//cc @open-telemetry/semconv-messaging-approvers

@lmolkova
Copy link
Contributor

lmolkova commented Mar 5, 2025

what could be the attribute name for it? any suggestions?

could you elaborate? attribute name for what?

@shivanshuraj1333
Copy link
Member

shivanshuraj1333 commented Mar 5, 2025

if you run a demo application like this, you'd see attributes coming like below, which captures all the otel-python instrumented celery specific attributes:

- celery.action ⇒ `run`
- celery.delivery_info ⇒ `{'exchange': '', 'routing_key': 'distilled-guidance-dpo-25-10k-v1', 'priority': 0, 'redelivered': False}`
- celery.hostname ⇒ `gen1@bestapi-6dfffd8486-6h9x5`
- celery.reply_to ⇒ `86b14e3b-bbdb-3d12-8838-c6125a236bc6`
- celery.state ⇒ `SUCCESS`
- celery.task_name ⇒ `agent_telemetry_processor`
- messaging.conversation_id ⇒ `90054b2c-4e21-4edd-8999-605c1aa703b0`
- messaging.destination ⇒ `distilled-guidance-dpo-25-10k-v1`
- messaging.message_id ⇒ `90054b2c-4e21-4edd-8999-605c1aa703b0`

Now I see attributes with messaging.* in the backend to do correlations, but as per discussion we can not have messaging.system=celery.

Now one way to know that task processor celery is used is to check the presence of celery.* attributes or capture it under some attribute name. Is there a sem conv for that or we can have something like messaging.task.processor=celery

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Development

Successfully merging this pull request may close these issues.

3 participants