-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance Thread Safety in Core Data Management #12805
base: trunk
Are you sure you want to change the base?
Conversation
…tContainer` and its contexts are now thread-safe, using a dedicated synchronization queue (`syncQueue`).
📲 You can test the changes from this Pull Request in WooCommerce iOS by scanning the QR code below to install the corresponding build.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this !
What would be the trade-offs of following this approach?
My understanding is that one of the trade-offs for safety would be a performance hit due to the overhead of queue synchronisation (which should be fine in our case right now, but we should find a way to measure it).
My main concern is that while we would be sure to run all these processes asynchronously in the syncQueue to assure safety, this would also block the thread until the operation is completed, isn't it? Since our action dispatcher always runs in the main thread by design, would upsert-related actions potentially be a source of deadlocks?
|
||
func test_viewStorage_is_thread_safe() { | ||
// Given | ||
let manager = CoreDataManager(name: storageIdentifier, crashLogger: MockCrashLogger()) | ||
let expectation = self.expectation(description: "Thread Safety") | ||
|
||
// When | ||
DispatchQueue.global(qos: .background).async { | ||
let viewContext = manager.viewStorage as? NSManagedObjectContext | ||
// Then | ||
XCTAssertNotNil(viewContext) | ||
expectation.fulfill() | ||
} | ||
|
||
wait(for: [expectation], timeout: Constants.expectationTimeout) | ||
} | ||
|
||
func test_reset_is_thread_safe() { | ||
// Given | ||
let manager = CoreDataManager(name: storageIdentifier, crashLogger: MockCrashLogger()) | ||
let expectation = self.expectation(description: "Thread Safety") | ||
|
||
// When | ||
DispatchQueue.global(qos: .background).async { | ||
manager.reset() | ||
// Then | ||
expectation.fulfill() | ||
} | ||
|
||
wait(for: [expectation], timeout: Constants.expectationTimeout) | ||
} | ||
|
||
func test_concurrent_access_to_core_data_manager() { | ||
// Given | ||
let manager = CoreDataManager(name: storageIdentifier, crashLogger: MockCrashLogger()) | ||
let expectation1 = self.expectation(description: "Thread 1") | ||
let expectation2 = self.expectation(description: "Thread 2") | ||
|
||
// When | ||
DispatchQueue.global(qos: .background).async { | ||
manager.reset() | ||
// Then | ||
expectation1.fulfill() | ||
} | ||
|
||
DispatchQueue.global(qos: .background).async { | ||
let viewContext = manager.viewStorage as? NSManagedObjectContext | ||
// Then | ||
XCTAssertNotNil(viewContext) | ||
expectation2.fulfill() | ||
} | ||
|
||
wait(for: [expectation1, expectation2], timeout: Constants.expectationTimeout) | ||
} | ||
|
||
func test_saving_operations_are_thread_safe() { | ||
// Given | ||
let manager = CoreDataManager(name: storageIdentifier, crashLogger: MockCrashLogger()) | ||
let expectation = self.expectation(description: "Thread Safety") | ||
|
||
// When | ||
DispatchQueue.global(qos: .background).async { | ||
let derivedContext = manager.writerDerivedStorage as? NSManagedObjectContext | ||
derivedContext?.performAndWait { | ||
_ = derivedContext?.insertNewObject(ofType: ShippingLine.self) | ||
derivedContext?.saveIfNeeded() | ||
} | ||
// Then | ||
expectation.fulfill() | ||
} | ||
|
||
wait(for: [expectation], timeout: Constants.expectationTimeout) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite sure how useful are these tests or if are actually testing the changes as we intend to, if we remove the syncQueue.sync
in this PR and re-run the tests, all of them still pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite sure how useful are these tests or if are actually testing the changes as we intend to, if we remove the syncQueue.sync in this PR and re-run the tests, all of them still pass.
I'm not sure why I lost this comment. By the way, @iamgabrielma, you're right, but at the same time, it's quite difficult to replicate thread safety issues, so these tests serve more as a sort of sanity check rather than unit tests. By the way, if you have any suggestions for improvement, I'm open to hearing them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely tricky.
I don't have a better suggestion for the test themselves, but I think that we should change the names at a minimum so they doesn't conflate that are testing something that aren't, in this case we don't know if the operations are thread safe, for example: Accessing viewStorage
from a background thread and verifying its non-nil state does not confirm thread-safety, or the "reset" test just demonstrates that the function completes without immediate errors on a background thread, not its thread safety.
Which bring the point of: Are these tests really necessary? If they are, they should be renamed to not miss-guide what they test.
expectation.fulfill() | ||
} | ||
|
||
wait(for: [expectation], timeout: Constants.expectationTimeout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see expectationTimeout
has been with us for a long time, but 10 seconds feels extreme. Having a specific number of seconds for timeout is arbitrary, but if we change it to 1 second the test suite still passes, and around 0.1-0.5 starts to become flaky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This constant is used almost everywhere in the unit tests, so if I need a custom timeout for these methods, I would prefer to set it individually rather than change it globally. This is because tests might still pass on your local machine but could require more time on the CI server. In any case, for most methods where it is applied, this value simply acts as the timeout limit. However, if a test is successful, it typically doesn't require the full duration of this timeout to complete. I would make a separate PR to change the timeout since it impacts hundreds of tests
|
||
// Then | ||
wait(for: [expectation1, expectation2, expectation3], timeout: 5.0) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as with the above tests, if we remove the syncqueue changes to make the calls single-threaded this test still passes, so it's unlikely that's testing the changes in the way we want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about this either. How about using DispatchQueue.concurrentPerform
like this suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried it, implementing it in this way:
func test_thread_safety_for_readOnlyEntity() {
// Given: A sample storage account is inserted and saved.
let storageAccount = storageManager.insertSampleAccount()
viewContext.saveIfNeeded()
// And: An EntityListener is set up with the readOnly entity being the inserted storage account.
let listener = EntityListener(viewContext: viewContext, readOnlyEntity: storageAccount.toReadOnly())
// When: Multiple threads access the readOnlyEntity property concurrently.
let iterations = 100
let expectation = self.expectation(description: "Concurrent access")
DispatchQueue.global(qos: .userInitiated).async {
DispatchQueue.concurrentPerform(iterations: iterations) { index in
print("Iteration \(index)")
_ = listener.readOnlyEntity
}
expectation.fulfill()
}
// Then: If no crashes or exceptions occurred, the test is considered successful.
wait(for: [expectation], timeout: 10.0)
XCTAssert(true)
}
But no luck, the test still passes. I would say that it's probably also "normal"? Since it's quite difficult to replicate thread safety issues?
let container = NSPersistentContainer(name: name, managedObjectModel: modelsInventory.currentModel) | ||
container.persistentStoreDescriptions = [storeDescription] | ||
|
||
let migrationDebugMessages = migrateDataModelIfNecessary(using: container.persistentStoreCoordinator) | ||
let container = NSPersistentContainer(name: name, managedObjectModel: modelsInventory.currentModel) | ||
container.persistentStoreDescriptions = [storeDescription] | ||
|
||
container.loadPersistentStores { [weak self] (storeDescription, error) in | ||
guard let `self` = self, let persistentStoreLoadingError = error else { | ||
return | ||
} | ||
let migrationDebugMessages = migrateDataModelIfNecessary(using: container.persistentStoreCoordinator) | ||
|
||
DDLogError("⛔️ [CoreDataManager] loadPersistentStore failed. Attempting to recover... \(persistentStoreLoadingError)") | ||
|
||
/// Remove the old Store which is either corrupted or has an invalid model we can't migrate from | ||
/// | ||
var persistentStoreRemovalError: Error? | ||
do { | ||
try container.persistentStoreCoordinator.destroyPersistentStore(at: self.storeURL, | ||
ofType: storeDescription.type, | ||
options: nil) | ||
} catch { | ||
persistentStoreRemovalError = error | ||
} | ||
|
||
/// Retry! | ||
/// | ||
container.loadPersistentStores { [weak self] (storeDescription, underlyingError) in | ||
guard let underlyingError = underlyingError as NSError? else { | ||
container.loadPersistentStores { [weak self] (storeDescription, error) in | ||
guard let `self` = self, let persistentStoreLoadingError = error else { | ||
return | ||
} | ||
|
||
let error = CoreDataManagerError.recoveryFailed | ||
DDLogError("⛔️ [CoreDataManager] loadPersistentStore failed. Attempting to recover... \(persistentStoreLoadingError)") | ||
|
||
/// Remove the old Store which is either corrupted or has an invalid model we can't migrate from | ||
/// | ||
var persistentStoreRemovalError: Error? | ||
do { | ||
try container.persistentStoreCoordinator.destroyPersistentStore(at: self.storeURL, | ||
ofType: storeDescription.type, | ||
options: nil) | ||
} catch { | ||
persistentStoreRemovalError = error | ||
} | ||
|
||
/// Retry! | ||
/// | ||
container.loadPersistentStores { [weak self] (storeDescription, underlyingError) in | ||
guard let underlyingError = underlyingError as NSError? else { | ||
return | ||
} | ||
|
||
let error = CoreDataManagerError.recoveryFailed | ||
let logProperties: [String: Any?] = ["persistentStoreLoadingError": persistentStoreLoadingError, | ||
"persistentStoreRemovalError": persistentStoreRemovalError, | ||
"retryError": underlyingError, | ||
"appState": UIApplication.shared.applicationState.rawValue, | ||
"migrationMessages": migrationDebugMessages] | ||
self?.crashLogger.logFatalErrorAndExit(error, | ||
userInfo: logProperties.compactMapValues { $0 }) | ||
} | ||
|
||
let logProperties: [String: Any?] = ["persistentStoreLoadingError": persistentStoreLoadingError, | ||
"persistentStoreRemovalError": persistentStoreRemovalError, | ||
"retryError": underlyingError, | ||
"appState": UIApplication.shared.applicationState.rawValue, | ||
"migrationMessages": migrationDebugMessages] | ||
self?.crashLogger.logFatalErrorAndExit(error, | ||
userInfo: logProperties.compactMapValues { $0 }) | ||
self.crashLogger.logMessage("[CoreDataManager] Recovered from persistent store loading error", | ||
properties: logProperties.compactMapValues { $0 }, | ||
level: .info) | ||
} | ||
|
||
let logProperties: [String: Any?] = ["persistentStoreLoadingError": persistentStoreLoadingError, | ||
"persistentStoreRemovalError": persistentStoreRemovalError, | ||
"appState": UIApplication.shared.applicationState.rawValue, | ||
"migrationMessages": migrationDebugMessages] | ||
self.crashLogger.logMessage("[CoreDataManager] Recovered from persistent store loading error", | ||
properties: logProperties.compactMapValues { $0 }, | ||
level: .info) | ||
} | ||
|
||
return container |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This diff seems unnecessary and just adds mental overhead to review the PR/find changes in the history.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This diff seems unnecessary and just adds mental overhead to review the PR/find changes in the history.
I think there was a wrong indentation in the code, that has been like that for years
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the updated indent is incorrect, can you revert that please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you check if they have been fixed now?
Co-authored-by: Gabriel Maldonado <[email protected]>
Co-authored-by: Gabriel Maldonado <[email protected]>
Thanks, @iamgabrielma for the review!
As you said, there is a trade-off. Using Wanting to be more pragmatic, the main problem I want to solve all the way, is thread safety, which must be guaranteed by us, because by default each To address the problem and determine if it's truly a thread safety issue, we could also use PS: discovered for instance, |
childManagedObjectContext.parent = persistentContainer.viewContext | ||
childManagedObjectContext.mergePolicy = NSMergeByPropertyObjectTrumpMergePolicy | ||
return childManagedObjectContext | ||
return syncQueue.sync { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using DispatchQueue
and closures, how about making CoreDataManager
an actor
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is another possibility, but do you see a real benefit over DispatchQueue
? I'm concerned that making this class an actor might entail many changes elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A benefit is to have a more modern syntax with no requirement for using closures, making the code more readable. I understand the potential side effects of changes required at call sites though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe, and let me know if you agree, that if this PR is merged and it fixes the crashes on Core Data, it's worth investing some time later in migrating it to actor
, unless the impact is huge.
…786-make-coredata-even-more-thread-safe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @pmusolino - I'm not confident about the changes making all access to the storage sync instead of async due to the potential hang this could cause, so I cannot approve this PR. I'll ping random folks here for more opinions on this @selanthiraiyan @jaclync @joshheald.
guard let note = ManagedObjectsDidChangeNotification(notification: notification) else { | ||
return | ||
} | ||
queue.sync { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❓ Do we also need a queue here? i.e. Do we have to execute these callbacks synchronously?
I have update the target milestone to 19.4 as part of the release process for 19.3. |
Generated by 🚫 Danger |
I'll remove the tag for the moment until the PR is ready for a specific milestone. |
…786-make-coredata-even-more-thread-safe
…ps://github.com/woocommerce/woocommerce-ios into issue/12786-make-coredata-even-more-thread-safe
@@ -15,6 +15,9 @@ public final class CoreDataManager: StorageManagerType { | |||
|
|||
private let modelsInventory: ManagedObjectModelsInventory | |||
|
|||
// A dispatch queue for synchronizing access to shared attributes | |||
private let syncQueue = DispatchQueue(label: "com.automattic.woocommerce.CoreDataManager.syncQueue") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This queue is used to prevent concurrent access to the read context and write context themselves. They can still be used to perform concurrent Core Data changes.
If I understand it correctly, you'd want synchronized data writing to prevent database corruption. I don't think this new serial queue will help that.
NotificationCenter.default.post(name: .StorageManagerDidResetStorage, object: self) | ||
DDLogVerbose("💣 [CoreDataManager] Stack Destroyed!") | ||
NotificationCenter.default.post(name: .StorageManagerDidResetStorage, object: self) | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there is a potential dead lock here.
Imagine reset
is called on the main thread. The main thread now waits on the syncQueue
to finish running. The syncQueue
waits viewContext
to complete. If the viewContext
is bound to the main context, it waits for the main thread, which is blocked.
The above scenario is pretty easy to set up in unit tests. I'd suggesting adding an unit test to see if there is a dead-lock here.
Closes: #12786
Description
It seems there are multiple crashes in the app, even very old crashes, which have to do with issues such as data corruption and deadlocks related to CoreData. #12785 is one of these (extra info here peaMlT-DX), and as usual, this is one of those crashes that is impossible to replicate, since concurrency issues often involve race conditions, where the timing of multiple threads accessing or modifying shared resources leads to unpredictable behavior. The exact timing of these operations can vary with each run of the application, making it hard to consistently reproduce the issue.
In this PR I attempt a series of enhancements to our Core Data management strategy, focusing on improving thread safety and resolving potential deadlocks, both on
CoreDataManager
andEntityListener
. These changes should ensure that Core Data operations are performed in a safe and efficient manner across different threads.For reviewing this PR -which must make sense to you, there is also the possibility that this code will not be merged at the end-; I added all the people who in the past have been involved with major changes to these classes, or participated in debugging some of these crashes. But given also the potential impact under the hood of these changes, I need your opinion and extra extra pair of eyes. <- If this PR is merged and we notice no impact at all, I can revert the changes.
Key Changes
syncQueue
) is now utilized to manage access to thepersistentContainer
and its contexts, ensuring thread-safe operations. The dispatch queue ensures that only one thread can access or modify the shared resources (e.g.,persistentContainer
,viewContext
) at a time.NSPersistentContainer
.CoreDataManager
, the other class that should be thread safe I think it should beEntityListener
, so also here I introduced the same synchronization queue.Testing instructions
RELEASE-NOTES.txt
if necessary.