-
-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Submission duplication: add advisory lock on xml_hash #859
Open
RobertoMaurizzi
wants to merge
1
commit into
kobotoolbox:main
Choose a base branch
from
catalpainternational:submission_duplication_race
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Submission duplication: add advisory lock on xml_hash #859
RobertoMaurizzi
wants to merge
1
commit into
kobotoolbox:main
from
catalpainternational:submission_duplication_race
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This has been tested on what we have in production, version 2.022.24a |
rajpatel24
added a commit
to kobotoolbox/kpi
that referenced
this pull request
Jan 15, 2025
## Summary Implemented logic to detect and reject duplicate submissions. ## Description We have identified a race condition in the submission processing that causes duplicate submissions with identical UUIDs and XML hashes. This issue is particularly problematic under conditions with multiple remote devices submitting forms simultaneously over unreliable networks. To address this issue, a PR has been raised with the following proposed changes: - Race Condition Resolution: A locking mechanism has been added to prevent the race condition when checking for existing instances and creating new ones. This aims to eliminate duplicate submissions. - UUID Enforcement: Submissions without a UUID are now explicitly disallowed. This ensures that every submission is uniquely identifiable and further mitigates the risk of duplicate entries. - Introduction of `root_uuid`: - To ensure a consistent submission UUID throughout its lifecycle and prevent duplicate submissions with the same UUID, a new `root_uuid` column has been added to the `Instance` model with a unique constraint (`root_uuid` per `xform`). - If the `<meta><rootUuid>` is present in the submission XML, it is stored in the `root_uuid` column. - If `<meta><rootUuid>` is not present, the value from `<meta><instanceID>` is used instead. - This approach guarantees that the `root_uuid` remains constant across the lifecycle of a submission, providing a reliable identifier for all instances. - UUID Handling Improvement: Updated the logic to strip only the `uuid:` prefix while preserving custom, non-UUID ID schemes (e.g., domain.com:1234). This ensures compliance with the OpenRosa spec and prevents potential ID collisions with custom prefixes. - Error Handling: - 202 Accepted: Returns when content is identical to an existing submission and successfully processed. - 409 Conflict: Returns when a duplicate UUID is detected but with differing content. These changes should improve the robustness of the submission process and prevent both race conditions and invalid submissions. ## Notes - Implemented a fix to address the race condition that leads to duplicate submissions with the same UUID and XML hash. - Incorporated improvements from existing work, ensuring consistency and robustness in handling concurrent submissions. - The fix aims to prevent duplicate submissions, even under high load and unreliable network conditions. ## Related issues Supersedes [kobotoolbox/kobocat#876](kobotoolbox/kobocat#876) and kobotoolbox/kobocat#859 --------- Co-authored-by: Olivier Leger <[email protected]>
magicznyleszek
added a commit
to kobotoolbox/kpi
that referenced
this pull request
Jan 17, 2025
commit 11ee676 Author: Rebecca Graber <[email protected]> Date: Thu Jan 16 08:47:09 2025 -0500 feat(projectHistoryLogs): log new submissions (#5416) ### 📣 Summary Create logs when new submissions are added to projects. ### 👷 Description for instance maintainers Allow null user_uids in AuditLogs so we can log anonymous submissions. ### 💭 Notes We previously had no need for null users in audit logs because the actions we logged were all restricted to authenticated users, but since we allow anonymous submissions, we needed a way to log those. ### 👀 Preview steps Feature/no-change template: 1. ℹ️ have an account and a project. Make sure the account username is not `admin` (see [this notion task](https://www.notion.so/kobotoolbox/Anonymous-submissions-dont-work-if-user-named-admin-owns-asset-1767e515f65480608dfcee76ba9b3710?pvs=4)) 2. Deploy the project 3. Add a submission to the project 4. Go to `api/v2/asset/<asset-uid>/history` 5. 🟢 There should be a new project history log with `action='add-submission'` and all the usual metadata, plus ``` "submission": { "submitted_by": "user1" } ``` 6. Enable submissions without username/email to the project 7. To make sure you're submitting anonymously, copy and paste the enketo link into a new private tab and add a new submission 8. 🟢 Reload the endpoint. There should be a new audit log with `action='add-submission'` a. The user should be `http://kf.kobo.local:8080/api/v2/users/AnonymousUser/` b. The user_uid will be the uid of the anonymous user in the database c. The username should be `AnonymousUser` d. The metadata should contain `{"submission": {"submitted_by": "AnonymousUser"}` in addition to the usual commit bbfdaf1 Author: olive-KTB <[email protected]> Date: Thu Jan 16 02:49:14 2025 +0100 update gitlab-ci.yml commit bc56f8f Author: Akuukis <[email protected]> Date: Wed Jan 15 10:50:39 2025 +0200 refactor(frontend): Mantine Component Library PoC (#5344) ### 💭 Notes Please read the PR commit-by-commit, here's a guide. 1. [3105f47](3105f47) Setup Mantine Component Library. Please install the new dependencies, and add recommended VSCode extensions. 2. [453f2c1](453f2c1) Here's a preview that Mantine in general works, and API for the example Button is different but generally similar. | our button | Mantine default button | |--------|--------| | ![image](https://github.com/user-attachments/assets/1f3e5736-c5eb-4cbf-a56e-a393dab561d3) | ![image](https://github.com/user-attachments/assets/c8f2a9dc-cf1a-44c3-881a-ff05da433060) | ![image](https://github.com/user-attachments/assets/e2abe407-3113-4900-9027-43186137ce13) 3. [d782e38](d782e38) Example of custom styled component, Button. IMHO achieves pixel-perfect match in storybook and example above, except for line breaks, spinner and hover animations. Click animation matches out of box. Icon-only buttons are omitted because Mantine uses a different component `IconAction` for those. | Original on left / Mantine implementation on right | | --- | | ![image](https://github.com/user-attachments/assets/9a58b7af-3ff4-4b18-995c-6c57ff4264cb) | 5. [eaf45e3](eaf45e3) Wrapped Button to add support for inbuilt Tooltip. No idea if we want to move forward with these two coupled, but I found it useful for comparison by re-implementing part of old Button behavior that's represented in storybook. | Original | Mantine implementation | | --- | --- | | ![image](https://github.com/user-attachments/assets/7bf2a504-902b-4801-a2dd-3e7648166316) | ![image](https://github.com/user-attachments/assets/41bc4dff-acf3-4328-ab6f-f7fcccbde20b) | ### 👀 Preview steps 1. ℹ️ open Kobo home 4. 🟢 [on main] notice the original "new" button 6. 🟢 [on PR] notice the new "new" button with default mantine style, only slightly different --------- Co-authored-by: Leszek Pietrzak <[email protected]> Co-authored-by: Leszek <[email protected]> Co-authored-by: James Kiger <[email protected]> Co-authored-by: Paulo Amorim <[email protected]> Co-authored-by: James Kiger <[email protected]> commit 9598180 Author: Raj Patel <[email protected]> Date: Wed Jan 15 13:29:03 2025 +0530 fix!: reject duplicate submissions (#5047) ## Summary Implemented logic to detect and reject duplicate submissions. ## Description We have identified a race condition in the submission processing that causes duplicate submissions with identical UUIDs and XML hashes. This issue is particularly problematic under conditions with multiple remote devices submitting forms simultaneously over unreliable networks. To address this issue, a PR has been raised with the following proposed changes: - Race Condition Resolution: A locking mechanism has been added to prevent the race condition when checking for existing instances and creating new ones. This aims to eliminate duplicate submissions. - UUID Enforcement: Submissions without a UUID are now explicitly disallowed. This ensures that every submission is uniquely identifiable and further mitigates the risk of duplicate entries. - Introduction of `root_uuid`: - To ensure a consistent submission UUID throughout its lifecycle and prevent duplicate submissions with the same UUID, a new `root_uuid` column has been added to the `Instance` model with a unique constraint (`root_uuid` per `xform`). - If the `<meta><rootUuid>` is present in the submission XML, it is stored in the `root_uuid` column. - If `<meta><rootUuid>` is not present, the value from `<meta><instanceID>` is used instead. - This approach guarantees that the `root_uuid` remains constant across the lifecycle of a submission, providing a reliable identifier for all instances. - UUID Handling Improvement: Updated the logic to strip only the `uuid:` prefix while preserving custom, non-UUID ID schemes (e.g., domain.com:1234). This ensures compliance with the OpenRosa spec and prevents potential ID collisions with custom prefixes. - Error Handling: - 202 Accepted: Returns when content is identical to an existing submission and successfully processed. - 409 Conflict: Returns when a duplicate UUID is detected but with differing content. These changes should improve the robustness of the submission process and prevent both race conditions and invalid submissions. ## Notes - Implemented a fix to address the race condition that leads to duplicate submissions with the same UUID and XML hash. - Incorporated improvements from existing work, ensuring consistency and robustness in handling concurrent submissions. - The fix aims to prevent duplicate submissions, even under high load and unreliable network conditions. ## Related issues Supersedes [kobotoolbox/kobocat#876](kobotoolbox/kobocat#876) and kobotoolbox/kobocat#859 --------- Co-authored-by: Olivier Leger <[email protected]> commit d22b8b5 Merge: 89bd9b7 b4aa1b7 Author: John N. Milner <[email protected]> Date: Tue Jan 14 15:20:49 2025 -0500 Merge remote-tracking branch 'origin/release/2.024.36' commit 89bd9b7 Author: Rebecca Graber <[email protected]> Date: Tue Jan 14 15:11:37 2025 -0500 fix(auditLogs): correctly serialize audit logs from deleted users (#5418) ### 📣 Summary Fixes a 500 error from the various audit log endpoints when there are actions by deleted users. ### 📖 Description Return empty user and username fields in the response if the user was deleted after the log was created. This applies to `/api/v2/audit-logs`, `api/v2/assets/<uid>/history`, and `api/v2/project-history-logs`. ### 💭 Notes Small fix in the serializer. Also updates the ProjectHistoryLog serializer to inherit from the AuditLogSerializer so we don't have to duplicate the method fields. ### 👀 Preview steps Bug template: 1. ℹ️ have a super user account and a project 2. Create a new user (user1) and give them the `Edit Form` permission on the project. 3. Log in as user1 and make an edit to the project. 4. Log out user1 and log back in as the super user 5. Delete user1. You can do this from the admin page if you delete the user from the User list, then from the Trash Bin. 6. Go to: a. `api/v2/audit-logs` b. `api/v2/project-history-logs` c. `api/v2/assets/<uid>/history` 7. 🔴 [on main] All will return a 500 error (`AttributeError: 'NoneType' object has no attribute 'username'`) 8. 🟢 [on PR] The endpoint will return the expected logs. For all user1's actions, the user and username fields will be empty. but the user_uid should still refer to the old user. commit b4aa1b7 Author: jnm <[email protected]> Date: Tue Jan 14 15:01:18 2025 -0500 feat: export background-geopoint as GPS field (#5420) See kobotoolbox/formpack#327. This change just updates the formpack commit hash used by KPI commit dddd619 Author: Olivier Léger <[email protected]> Date: Tue Jan 14 14:10:31 2025 -0500 fix: catch additional XLSForm validation errors during deployment (#5419) ### 📣 Summary Enhanced error handling to catch more validation errors in XLSForm during deployment. ### 📖 Description Validation error handling for XLSForm deployment has been enhanced to catch a wider range of issues. This prevents the display of a generic 500 error in the deployment modal and instead returns the explicit error message. ### Notes Supersedes #5417, #5411 and #5403 commit 9189ac9 Author: Olivier Léger <[email protected]> Date: Tue Jan 14 09:40:34 2025 -0500 fix: handle case sensitivity for "Settings" sheet name with explicit error TASK-1353 (#5417) ### 📣 Summary Improved error handling for case-sensitive "Settings" sheet names. ### 📖 Description This update addresses an issue where a sheet named "Settings" with uppercase or mixed case letters causes unexpected behavior. An explicit error message is now raised to alert users of the case sensitivity, ensuring they can resolve the issue easily. commit 8e8d6bb Author: Rebecca Graber <[email protected]> Date: Mon Jan 13 15:02:04 2025 -0500 test: rename admin user in fixture (#5415) ### 💭 Notes Developer-facing changes only. Changes the username of the admin user to `adminuser` in preparation for disallowing the name `admin` as part of https://www.notion.so/kobotoolbox/Anonymous-submissions-dont-work-if-user-named-admin-owns-asset-1767e515f65480608dfcee76ba9b3710
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The problem
During a project we're managing for a national government we noticed several duplicated submissions with the same uuid and xml_hash, as describe in #470
After a bit of investigation, we found a race condition caused by transaction isolation between different sessions: in https://github.com/kobotoolbox/kobocat/blob/main/onadata/libs/utils/logger_tools.py#L177 there's a check to find other submissions with the same xml_hash from the same user, however this check won't be able to detect other submissions being created concurrently because they're all in different database transactions, "since
ATOMIC_REQUESTS
is set toTrue
".In this case we need another mechanism to detect if there are other documents with the same xml_hash being submitted but still not completely written to the database.
Reproducing duplication reliably
The main problem with testing a fix for this problem is that it usually happens only when you have many remote devices submitting forms with attachments using unreliable networks.
Given we were suspecting a race condition we wrote an async script that would submit the same form and its attachments 5 times from "parallel" async tasks (in hindsight using requests with multiprocessing would have been a better use of my time but hey, async is fashionable... 😅)
Possible solution
A PostgreSQL's Advisory lock is perfect for the task: it can work across transaction and session boundaries and allows to ensure atomic access using an integer id. So we got the xml_hash number, converted it to an integer then took its 60 least significant bits to obtain a (hopefully still) unique id for the lock.
The strategy is:
Testing the problem and the fix
If you extract the files from this duplication_repro.zip file, you'll find:
sample_form.xml
that you can import in your kf instance (a local development instance works fine)testfiles/
multiple_submission_test.py
scriptFirst of all, import the test form in your kf application, taking note of its ID (something like the
a7G7MvQMmzARfb2irNsyn3
already in the script)Create and activate a virtual environment if you're not already in one, then
pip install aiohttp aiofiles dict2xml
, then edit the script: in the first few lines after the imports, you need to change the following variables so they match your test instance:run the script from the same directory in which you extracted it: on an unpatched system most of the 5 submissions (often all) will get accepted and you'll have 5 new submissions with exactly the same content.
If you switch to a branch that includes this PR, you should get 1 accepted submission and 4
Duplicate submission
responses.Remaining issues and questions
Most important: I'm not sure if "DuplicateSubmission" is the best exception to raise: maybe telling the client to retry (how?) is a safer approach.
In my opinion the big question however is why it's so common to receive multiple submissions at the same time:
This was an obvious race condition, but there could be more. It took us a long time to find how to write a reproduction script able to work reliably both locally and remotely, so we were unable to deploy the fix to our production system in time (our data collection efforts were pratically closed when we finished this).
P.S.
Many thanks to whoever wrote the
clean_duplicated_submissions
admin command (probably @noliveleger)