-
Notifications
You must be signed in to change notification settings - Fork 539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tests failing for @aws-sdk/client-sqs >=3.316
#1477
Comments
cc @carolabadeer would you mind looking into this as you're listed as the code owner? 🙂 This could mean that the instrumentation is not producing the expected telemetry for the most recent versions of I'm able to produce this locally by installing either |
Thanks for raising this @pichlermarc! I will take a look into what could be causing this issue |
labeling p4 as this does not affect users |
@carolabadeer any update? |
Currently working with the AWS SDK for JS team to identify possible issues in the changes between @aws-sdk/client-sqs v3.315 and v3.316 that could be causing this test failure. Will post updates here on findings and progress |
While waiting I also looked into the behavior in the unit tests and found that there are two spans being used (not sure if they are meant to be the same) - a test span and a callback span. The callback span is the one whose attributes are being checked in the unit test that is failing. In that span, the test is checking for a
Since this is an S3 call, there shouldn't be a The test span, on the other hand, is the one wrapping the SQS ReceiveMessage call. That span contains the correct
These are my findings so far, but I am still curious about 2 questions:
// where span.attributes represents the attributes of the test span shown above
expect(span.attributes[SemanticAttributes.MESSAGING_OPERATION]).toMatch(
MessagingOperationValues.RECEIVE
); instead of expect(attributes[SemanticAttributes.MESSAGING_OPERATION]).toMatch(
MessagingOperationValues.RECEIVE
); @blumamir since you added this unit test, maybe you can give us some insight?
|
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days. |
why are we seeing S3 CreateBucket span in this test?
So what this part actually asserts is that the callback itself is being invoked with the receive span set as the active span for the callback invocation. The issue is that any operations triggered inside the callback, like calling I hope it is clear, and feel free to refactor the test so it better communicate it's intent. We can maybe create a new span in the callback and assert child-parent relationships, or we can just compare the active span id in the callback with the test span? I hope I don't miss anything there.
To me the immediate suspect is this S3 create bucket which seems unrelated. I would add a breakpoint and trace the stack to follow it's roots. Feel free to ping me in slack to research together. |
I opened #1711 to disable this test temporarily. If the PR merges, we'll need to remove the |
I have been looking at this a little bit. I don't have a fix yet, however. Also, see #1838 (comment) about a different test failure with more recent versions of
That failure, I believe, is because a recent-ish version of |
I believe I understand the issue with the v3.316.0 failure now, but I'm not sure how easily it can be fixed. I'll try to explain, and then I have a question about whether we even want to support this. (/cc @blumamir) the feature that is brokenThe aws-sdk instrumentation has special handling for SQS ReceiveMessage described here: https://github.com/open-telemetry/opentelemetry-js-contrib/blob/main/plugins/node/opentelemetry-instrumentation-aws-sdk/doc/sqs.md#receivemessage
What this means is that the instrumentation attempts to make it so: sqsClient.receiveMessage(params).then(res => {
// ...the current span context in here is the ReceiveMessage span.
}); The way that is done currently is by shimming the
the reason for breakageThis broke in class SQS extends SQSClient {
receiveMessage(args, optionsOrCb, cb) {
const command = new ReceiveMessageCommand_1.ReceiveMessageCommand(args);
return this.send(command, optionsOrCb);
} to this: class SQS extends SQSClient {
async receiveMessage(args, optionsOrCb, cb) {
const command = new ReceiveMessageCommand_1.ReceiveMessageCommand(args);
return this.send(command, optionsOrCb);
} The current possible fixA possible fix might be (I haven't tried this) to explicitly wrap that However, IIUC, there will always be possible context loss around some async/await usage patterns. This is mentioned in the "sqs.md" doc:
Also, tests for this for aws-sdk v2 are skipped (and have always been, I believe): opentelemetry-js-contrib/plugins/node/opentelemetry-instrumentation-aws-sdk/test/sqs.test.ts Lines 136 to 162 in fb80783
Discussion: do we want to support this?There are two special handling things being done by SQS ReceiveMessage instrumentation:
The latter is attempted by monkey-patching the sqs.receiveMessage(...).then(res => {
res.Messages.forEach(msg => {
console.log('one message is', msg);
});
}); I say "attempted" because the instrumentation is necessarily limited. This does not produce processing spans: sqs.receiveMessage(...).then(res => {
for (let msg of res.Messages) {
console.log('one message is', msg);
};
}); There is already a (very old) discussion issue for this: #707
I understand the desire to try to implement the OTel message spec guidelines to create processing spans automatically. However, I think the limited ability to do so in JavaScript results in surprising behaviour such that we might want to consider not doing this limited special handling at all. This would be a breaking change. Actually, just looking at the OTel Messaging spec (https://opentelemetry.io/docs/specs/semconv/messaging/messaging-spans/) now, has guidance for "processing" spans been dropped? Compare https://opentelemetry.io/docs/specs/semconv/messaging/messaging-spans/#batch-receiving to the older v1.20.0 equivalent: https://github.com/open-telemetry/opentelemetry-specification/blob/v1.20.0/specification/trace/semantic_conventions/messaging.md#batch-processing |
The "fix" or removal of the feature (see previous comment) isn't going to be a quick thing. I think in the interim we should disable this particular test. I believe we can skip just this test block:
this would be slight change to #1711. @pichlermarc Would you object to a change pushed to your #1711 to disable just the smaller part of aws-sdk-v3.test.ts that is failing? |
I don't have permissions to push to your branch, so I created #1847 as an alternative. |
Ah, yes that's not possible with organization forks. :/ |
…sqs ReceiveMessage context handling (#1847) In `@aws-sdk/client-sqs` v3.316 the SQS client methods became async. This breaks the `utils.bindPromise` usage that attempts to propagate the SQS ReceiveMessage span context to the user's handler for the method's returned promise. Fixing that propagation is for #707 or another issue. This change is a workaround that skips the testing of that span context propagation feature. Fixes: #1477 Refs: #707 Co-authored-by: Marc Pichler <[email protected]>
What version of OpenTelemetry are you using?
Current revision on
main
What version of Node are you using?
Node 16, 18
What did you do?
npm run test-all-versions
in@opentelemetry/instrumentation-aws-sdk
What did you expect to see?
Tests pass.
What did you see instead?
Tests fail for
@aws-sdk/client-sqs
>=3.316
Additional context
Run for Node 16: https://github.com/open-telemetry/opentelemetry-js-contrib/actions/runs/4787435108/jobs/8512667684
Run for Node 18: https://github.com/open-telemetry/opentelemetry-js-contrib/actions/runs/4787435108/jobs/8512667928
The text was updated successfully, but these errors were encountered: