Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix race condition in MQTT protocol when sending messages #1094
Fix race condition in MQTT protocol when sending messages #1094
Changes from all commits
e029ccf
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm understanding correctly, the race condition is that the
publishOption
can be changed before the message is published?Since there is only one place to set this field in the struct (
sdk-go/protocol/mqtt_paho/v2/option.go
Lines 29 to 37 in e6a74ef
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Left a comment in the issue to better understand the issue/root cause. What is causing the race? Why is
publishOption
changing during concurrent calls? Asking all these questions because I didn't write the implementation and not familiar with MQ. Furthermore, if we change as per the above, I wonder what the implication is for existing users because we're changing the semantics here (copy instead of pointer).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Cali0707! The code block you referred to is only invoked once when initializing the
client
. The issues arose when sending messages in many goroutines. Here is the detailThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI @embano1.
I append the root cause here.
The
publishOption
is a pointer reference by the MQTT message, which is used to hold the CloudEvent payload. The message(publishOption
) is shared across goroutines and will be changed by them.The
publishOption
is a pointer referenced by the MQTT message, which holds the CloudEvent payload. So the message (publishOption
) is shared across multiple goroutines and can be modified by them.By making the suggested changes, the user should not perceive anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please point me to the code/line where this happens?
Also, ideally you also write a test please to surface the current issue (panics) and how this PR fixes it, perhaps there's more issues due to the current implementation which we can detect this way 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@embano1 No worries!
Here’s the code line where the panic occurs.
I’ve also written an integration test that covers the concurrent panic issue, which you can use to reproduce it.