Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the LMSCourse backfill from grouping migration #6612

Merged
merged 1 commit into from
Aug 29, 2024

Conversation

marcospri
Copy link
Member

@marcospri marcospri commented Aug 29, 2024

For:

The existing LMSCourse has two assumptions, with one unique index for each:

  • The same course (same authority_provided_id) can belong to multiple applications instances but we'll keep only one row in LMSCourse for it (unique h_authority_provided_id)

  • authority_provided_id is generated from guid, lti_context_id, then, those two columns should also be unique together.

There's however a case when this is not true:

  • Applications instance with a null GUID launches and creates a course (None, CONTEXT_ID). This generates a hashed authority_provided_id.

  • The same application instance (ie, same row in ApplicationInstance) starts sending a GUID. We calculate a new (GUID, CONTEXT_ID) hash creating a new course, with a different authority_provided_id.

When we join with ApplicationInstance for the backfill we only have the most recent GUID and we end up creating (GUID, CONTEXT_ID) duplicates.

The solution is to exclude this type of duplicate, which is safe because this only happens on:

  • GUID changes from null to not null. We don't allow other value changes for GUID.

  • We only deduplicate courses when we have a newer version of the course, the older one can no longer be launched, the ApplicationInstance now has a GUID and it will sent along with every launch.

Testing

The select side of the migration can be tested.

Note the two different rows for the same course with two different authority_provided_ids

The new query removes the duplicate, only taking most recent course.

The existing LMSCourse has two assumptions, with one unique index for
each:

- The same course (same authority_provided_id) can belong to multiple
  applications instances but we'll keep only one row in LMSCourse for it
  (unique h_authority_provided_id)

- authority_provided_id is generated from guid, lti_context_id, then,
  those two columns should also be unique together.

There's however a case when this is not true:

- Applications instance with a null GUID launches and creates a course
  (None, CONTEXT_ID). This generates a hashed authority_provided_id.

- The same application instance (ie, same row in ApplicationInstance) starts sending a GUID.
  We calculate a new (GUID, CONTEXT_ID) hash creating a new course, with a different authority_provided_id.

When we join with ApplicationInstance for the backfill we only have the
most recent GUID and we end up creating (GUID, CONTEXT_ID) duplicates.

The solution is to exclude this type of duplicate, which is safe
because this only happens on:

- GUID changes from null to not null. We don't allow other value changes
  for GUID.

- We only deduplicate courses when we have a newer version of the course,
  the older one can no longer be launched, the ApplicationInstance now has a GUID and it will sent along with every launch.
@marcospri marcospri marked this pull request as ready for review August 29, 2024 08:42
@marcospri marcospri requested a review from seanh August 29, 2024 08:51
@marcospri marcospri mentioned this pull request Aug 29, 2024
21 tasks
@@ -13,21 +13,26 @@ def upgrade() -> None:
sa.text(
"""
WITH backfill as (
-- Deduplicate "grouping" courses on authority_provided_id
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block is used verbatim, it just become a subquery.

-- Pick the most recent "grouping" there are duplicates
ORDER BY authority_provided_id, "grouping".updated desc
-- Deduplicate on guid, lms_id
SELECT DISTINCT on (tool_consumer_instance_guid, lms_id) *
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New, distinct on, (tool_consumer_instance_guid, lms_id)

@marcospri marcospri merged commit 7489e5a into main Aug 29, 2024
9 checks passed
@marcospri marcospri deleted the fix-course-backfill branch August 29, 2024 13:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants