Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Telemetry event instead of predefined log message on batch timeout #1347

Merged

Conversation

katafrakt
Copy link
Contributor

I really love the work done about batch timeouts in #1298, with one exception I noted in a comment there. On a timeout, Absinthe produces a predefined log message on error level. This goes against my thinking that libraries generally should not produce logs on runtime (only on startup or maintenance actions, such as running mix tasks), especially when there's a better option available.

This PR suggests adding a new telemetry event [:absinthe, :middleware, :batch, :timeout] instead, so everyone can attach to it and do whatever they want (including logging the same message on the same level).

More context

I was about to bump Absinthe at our company project, but the approach with the log and not telemetry caused at least three hold-backs:

  1. The log message sometimes exceeds max size of the message our Datadog agent accepts, which leads to breaking it into two messages that cannot be parsed as valid JSON. As a results the logs are almost-useless for us. This is because we have some really big GraphQL queries, so output of inspect(batch_fun) is large.
  2. We have instrumentation on every error-level log message, but I'm not sure we want to trigger it in case of the timeout. So warning level, at least for evaluation period, would work much better here.
  3. We would, however, send the timeouts to Sentry with more detailed information. Telemetry event would allow it.

Other considerations

This is technically a breaking change, in case someone already relies on these log messages. But I think the capability was not advertised in changelog or anywhere, so perhaps it's safe to change. If not, perhaps we can at least give a configuration option to choose between the log message and telemetry.

@katafrakt katafrakt force-pushed the telemetry-for-batch-timeout branch from f2ced2e to b5de3b5 Compare November 18, 2024 17:38
@benwilson512
Copy link
Contributor

@katafrakt this is great! Can we add an item to the changelog with an example handler that would create the same sort of log message we currently do for anyone who wants to maintain backwards compatibility?

@katafrakt
Copy link
Contributor Author

@benwilson512 sure, I added something

Copy link
Contributor

@benwilson512 benwilson512 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thank you!

Instead of producing an eritrary log message with an arbitrary level, emit
a telemetry event [:absinthe, :middleware, :batch, :timeout], so everyone can
subscribe to it and do whatevet they want (including logging the same message
as was here before).
@katafrakt katafrakt force-pushed the telemetry-for-batch-timeout branch from 375b8ab to 8898d57 Compare November 19, 2024 09:07
@katafrakt
Copy link
Contributor Author

Reverted changes in typespec, as they were causing failures on the CI

@benwilson512 benwilson512 merged commit c5fc04c into absinthe-graphql:main Nov 21, 2024
6 checks passed
@benwilson512
Copy link
Contributor

Thank you!

@katafrakt katafrakt deleted the telemetry-for-batch-timeout branch November 22, 2024 14:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants