Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement grading strategies for peer evaluations #2196

Merged
merged 30 commits into from
May 9, 2024

Conversation

mariajgrimaldi
Copy link
Member

@mariajgrimaldi mariajgrimaldi commented Mar 21, 2024

TL;DR - This PR adds a new grading calculation strategy that can be configured in the assessment steps configurations for peer steps instead of the median calculation.

Currently, for steps that calculate a final score based on multiple scores (e.g. peer steps) do so using the median by default; the calculation is done here. Using the median for this use case has its advantages. However, using it by default without reconfigurability options removes other use cases that can be done using the mean calculation for the final score instead. This also opens the door for other grading strategies mentioned in this thread.

That's why this product proposal was created and later approved.

What changed?

  • The students' score is obtained before setting the score here
  • Each step API has its own get_score function
  • The peer step get_score gets the median score of all scores the student has at the moment of generation by using get_assessments_median_scores
  • The get_assessments_median_scores calls get_median_scores_dict, which calculates the final score for the student using the median.

So we can make the grading strategy configurable we:

Changed each reference to get_assessments_median_scores for a more generic get_assessments_peer_scores_with_grading_strategy which discriminates based on the workflow configuration which calculation to use. If the feature flag ENABLE_ORA_PEER_CONFIGURABLE_GRADING is off, then the default get_assessments_median_scores is used.

By default, the median calculations are used.

Developer Checklist

Testing Instructions

  1. Clone this branch into your file system
  2. Install edx-ora2 as an editable requirement (-e optional)
  3. Now, add an ORA component to your course.
  4. Turn on the feature by adding: FEATURES["ENABLE_ORA_PEER_CONFIGURABLE_GRADING"] = True
  5. Configure the steps as follows so these tests instructions make sense:
    image
  6. Use just one rubric configured with these option points:
    image
  7. The median and the mean look alike when the dataset is not skewed. So, so we can check we're using the mean instead of the median, let's use the following scoring sequence:
    Student 1 submits the ORA activity with a distinctive answer so we can distinguish it later
    Student 2 scores Student 1 with 0
    Student 3 scores Student 1 with 1
    Student 4 scores Student 1 with 5
    Now, the median is: 1 but the mean is: 2. So check that grading matches the mean value and the explanation says mean instead of median.
    image

If you don't configure the mean grading then the default behavior is kept:
image

Reviewer Checklist

Collectively, these should be completed by reviewers of this PR:

  • I've done a visual code review
  • I've tested the new functionality

@openedx-webhooks openedx-webhooks added the open-source-contribution PR author is not from Axim or 2U label Mar 21, 2024
@openedx-webhooks
Copy link

openedx-webhooks commented Mar 21, 2024

Thanks for the pull request, @mariajgrimaldi! Please note that it may take us up to several weeks or months to complete a review and merge your PR.

Feel free to add as much of the following information to the ticket as you can:

  • supporting documentation
  • Open edX discussion forum threads
  • timeline information ("this must be merged by XX date", and why that is)
  • partner information ("this is a course on edx.org")
  • any other information that can help Product understand the context for the PR

All technical communication about the code itself will be done via the GitHub pull request interface. As a reminder, our process documentation is here.

Please let us know once your PR is ready for our review and all tests are green.

@itsjeyd itsjeyd added the core contributor PR author is a Core Contributor (who may or may not have write access to this repo). label Mar 22, 2024
Copy link

codecov bot commented Mar 26, 2024

Codecov Report

Attention: Patch coverage is 95.33333% with 7 lines in your changes are missing coverage. Please review.

Project coverage is 95.06%. Comparing base (51d84c1) to head (69bc17a).

Files Patch % Lines
openassessment/xblock/utils/validation.py 16.66% 5 Missing ⚠️
openassessment/xblock/utils/xml.py 50.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##           master    #2196    +/-   ##
========================================
  Coverage   95.05%   95.06%            
========================================
  Files         193      193            
  Lines       21149    21285   +136     
  Branches     1906     1918    +12     
========================================
+ Hits        20104    20235   +131     
- Misses        780      785     +5     
  Partials      265      265            
Flag Coverage Δ
unittests 95.06% <95.33%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mariajgrimaldi mariajgrimaldi force-pushed the MJG/mean-grading-strategy branch 2 times, most recently from 49a679f to 78549ab Compare March 26, 2024 19:50
@mariajgrimaldi mariajgrimaldi marked this pull request as ready for review March 26, 2024 20:10
@mariajgrimaldi mariajgrimaldi requested a review from a team as a code owner March 26, 2024 20:10
@mariajgrimaldi
Copy link
Member Author

mariajgrimaldi commented Mar 26, 2024

I still need to generate all the translations so tests don't fail. Meanwhile, I'll open the PR for review. Thank you!

@mariajgrimaldi mariajgrimaldi changed the title [WIP] feat: implement mean score strategy feat: implement grading strategies for peer evaluations Mar 26, 2024
openassessment/assessment/models/base.py Outdated Show resolved Hide resolved
openassessment/assessment/models/base.py Outdated Show resolved Hide resolved
openassessment/assessment/models/base.py Outdated Show resolved Hide resolved
Copy link
Member Author

@mariajgrimaldi mariajgrimaldi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BryanttV: I think I addressed all your comments. Could you review again?

I still need to fix the JS tests, so I'll work on that. Thanks!

@BryanttV
Copy link
Contributor

BryanttV commented Apr 4, 2024

Hi @mariajgrimaldi, I have tested again and everything seems to work correctly. Only a minor change would be needed. Thank you!

@itsjeyd itsjeyd added the waiting on author PR author needs to resolve review requests, answer questions, fix tests, etc. label Apr 12, 2024
@mariajgrimaldi mariajgrimaldi force-pushed the MJG/mean-grading-strategy branch 2 times, most recently from aa96648 to 436ffce Compare April 18, 2024 18:38
@mariajgrimaldi
Copy link
Member Author

Hello, @pomegranited! I hope you're doing well. Could you give us a hand reviewing this PR? We'd appreciate it.

Regarding the test failures, I'll be working on increasing the PR coverage this week. I hope to update the PR soon. Thanks!

Copy link
Contributor

@pomegranited pomegranited left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mariajgrimaldi , thank you for this feature! I think it'll be useful for content authors.

I noted a few minor nits, and had a question about rounding, but nothing that's blocking merge. If you agree with my suggested changes, could you apply them and do a version bump? I can merge tomorrow.

👍

  • I tested this on my tutor dev stack with:
    • the flag disabled (by default) -- no option provided, defaulted to median
    • the flag enabled -- options provided, changed to "mean", was not able to change after block was published
  • I read through the code
  • I checked for accessibility issues by using my keyboard to navigate the ORA editor.
  • Includes documentation
  • User-facing strings are extracted for translation

openassessment/assessment/api/peer.py Outdated Show resolved Hide resolved
openassessment/assessment/models/base.py Show resolved Hide resolved
openassessment/assessment/test/test_peer.py Outdated Show resolved Hide resolved
openassessment/xblock/grade_mixin.py Show resolved Hide resolved
total_criterion_scores = len(scores)
if total_criterion_scores == 0:
return 0
return math.ceil(sum(scores) / float(total_criterion_scores))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got an unexpected mean score:

image

(it's the second default rubric, so poor=0, fair=1, good=3, excellent=5):
1 + 0 + 3 + 1 + 5 = 10, 10/5 =2

Is it because there's no 2 in the rubric, so it rounds up to the next one 3?

Copy link
Member Author

@mariajgrimaldi mariajgrimaldi May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This threw me for a loop. So, I tried replicating the same scenario. Here's the criteria I used:
Ideas: poor=0, fair=3, good=5
Content: poor=0, fair=1, good=3, excellent=5
Graded by 5 configuration
And got:
image

While researching, I found that the assessments are retrieved, ordered, and then truncated by the number of peer grades needed. Could you confirm that the number of peer grades configured while testing was 5?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also reproduced the same scenario with the graded-by-3 configurations and got 2 since the 1st three peers graded 1, 0, and 3 in that order. Please let me know if the submission order also affected your tests.

Copy link
Contributor

@pomegranited pomegranited May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow, submission order does affect the final score! That's annoying.

I also added an extra test here with the exact values: https://github.com/openedx/edx-ora2/pull/2196/files#diff-7fe9b2681acaf897f2dd62a13553badf288fd6144832ce0562c4a58bd2640e7fR2381

Thanks for that -- I also ran this test with the numbers shuffled in a different order, to confirm that it's not the "mean score" logic that's causing this issue. It must be something done after that.

--- a/openassessment/assessment/test/test_peer.py
+++ b/openassessment/assessment/test/test_peer.py
@@ -2375,6 +2375,7 @@ class TestPeerApi(CacheResetTest):
         self.assertEqual(31, Assessment.get_mean_score([5, 6, 12, 16, 22, 53, 102]))
         self.assertEqual(31, Assessment.get_mean_score([16, 6, 12, 102, 22, 53, 5]))
         self.assertEqual(2, Assessment.get_mean_score([0, 1, 3, 1, 5]))
+        self.assertEqual(2, Assessment.get_mean_score([5, 3, 1, 1, 0]))

Could you confirm that the number of peer grades configured while testing was 5?

Yes, 5 peer grades were required, and only 5 were received.

I used the default graded-by-5 configuration, with
Ideas: poor=0, fair=3, good=5
Content: poor=0, fair=1, good=3, excellent=5

Audit user received the same "Content" peer assessments as Honor, but in a different order, and so they got different scores. (Apologies about the "Ideas" assessments, I thought I did them all the same too, but apparently not.)

Audit received:
image

Honor received:
image

@pomegranited
Copy link
Contributor

@mariajgrimaldi I've re-tested with your changes and, with the exception of the ordering issue discussed above, I found no other issues.

Redwood gets cut soon, so I'm happy to merge this now, and then you can do a follow-up PR to fix the ordering bug? We'll also need a version bump to get this into edx-platform:

Bumped version number in openassessment/init.py and package.json

@mariajgrimaldi
Copy link
Member Author

@pomegranited: thank you again for such a detailed review! We appreciate it. I just bumped version number in openassessment/init.py and package.json, and I'll be researching further the issue with the grading ordering in the next few days. I'll let you know what I find!

@pomegranited pomegranited merged commit c2dc989 into openedx:master May 9, 2024
11 checks passed
@openedx-webhooks
Copy link

@mariajgrimaldi 🎉 Your pull request was merged! Please take a moment to answer a two question survey so we can improve your experience in the future.

@pomegranited
Copy link
Contributor

Awesome, thank you for your rapid reply @mariajgrimaldi ! ora2==6.10.0 is released and ready for you to do a version bump PR in edx-platform if you want.

@itsjeyd itsjeyd removed the waiting on author PR author needs to resolve review requests, answer questions, fix tests, etc. label May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core contributor PR author is a Core Contributor (who may or may not have write access to this repo). open-source-contribution PR author is not from Axim or 2U
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

5 participants