-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AWSClient segfaults after a data race #3084
Comments
Can you try adding brackets around the sdk code so that it is all in the same scope and is cleaned up correctly before
|
Hi @k0zmo , thank you for bringing and describing in details this issue. I agree with @jmklix in referring to our basic-use instruction: this CPP SDK is designed in a such a way that it requires extra scope, and extra caution in tracking async running tasks from the user. However, this is a recurring topic, so I submitted a PR #3087 having your suggestion to re-arrange the destruction order, as well as, in addition, some refactoring to use factories by default within the ClientConfiguration, and some other minor changes to track client "is initialized" checks. I hope we are going to merge this PR soon, while it is not going to solve all "Init-Shutdown-memory-allocator-async-tasks-tracking" kind of issues, it is still an improvement. Best regards, |
@jmklix Limiting a scope for AWSClient like you shown doesn't make difference. The problem I reported comes from an entangled lifetime between ClientConfiguration and AWSClient, not AWSClient and the global SDK state. |
This issue is now closed. Comments on closed issues are hard for our team to see. |
Describe the bug
Following simple code that starts a few async calls and then interrupts them segfaults (randomly):
There are two reasons for the segfault. One is extended lifetime of executor -
ClientConfiguration
creates ashared_ptr
toExecutor
, passes it onto the client but because its lifetime is longer than a client, it makes theExecutor
to live longer than the client.However, there's an implicit assumption when destroying the client that the executor will run to its completion (at least for already started packaged_tasks) as part of its destructor. This obviously is not invoked as
Executor
outlives the client. The most obvious fix is to addclient.executor.reset()
after creating an instance of client so it is the only owner ofExecutor
.The problem stems from using
shared_ptr
and simultaneously assuming it is the only owner - which cannot be guaranteed.Secondly, there's another data race (also leading to segfault pretty consistently) in
ClientWithAsyncTemplateMethods::ShutdownSdkClient
which is called as part ofS3Client::~S3Client()
. There, we first release them_endpointProvider
and thenm_executor
(but only if it's the only owner). This is wrong asm_endpointProvider
can still be used by threads started by theExecutor
- such as:The fix here is simply to move the
m_endpointProvider.reset()
call afterm_executor
is released:Running original version with thread sanitizer yields more than 100 warnings while the above changes gives none for attached sample code
Expected Behavior
No segfault nor datarace
Current Behavior
Above code segfault in various places, most likely on dereferencing already released EndpointProvider but also deep inside the AttemptExhaustively function on dereferencing RetryStrategy
Reproduction Steps
Run above code few times and observe segfaults
Possible Solution
No response
Additional Information/Context
No response
AWS CPP SDK version used
AWS SDK CPP, version 1.11.352
Compiler and Version used
gcc 11.4.0, Visual Studio 17.11, clang 14.0
Operating System and version
Ubuntu 22.04 and Windows 11
The text was updated successfully, but these errors were encountered: