[Outreachy Round 27] Stop storing liftwing features for non-wikidata wikis. #5682
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What this PR does
This PR is part of the "Improve how Wiki Education Dashboard counts references added" project (read issue #5547).
Before this PR, wikis supported by liftwing and reference-counter API had both API responses stored in the
features
/features_previous
fields.After this PR, only wikidata wikis (just supported by the liftwing API) have liftwing features stored in the
features
/features_previous
fields. Other non-wikidata wikis, likefr.wikipedia
ores.wiktionary
will only store reference-counter response in theirfeatures
/features_previous
fields.Context on why we're changing this
The revision score importing process is conducted through automatic jobs as part of the course updates. The RevisionScoreImporter class is responsible for selecting "unscored" revisions and querying the API to populate the revision score fields (such as features, wp10, etc.). It determines if a revision is "unscored" by checking if the features or features_previous fields are nil.
Based on my understanding, previously, if the process of querying the LiftWing API failed for any reason, the features field remained nil. Consequently, the RevisionScoreImporter would attempt to populate that field during the next run, as the revision would still be considered "unscored".
Now, with the introduction of two APIs (LiftWing and reference-counter), and storing values from both in the features field, this behavior has changed. For instance, if a LiftWing API request fails unexpectedly, the features field will not be nil because it will contain the response from the reference-counter API. As a result, during the subsequent course update run, the revision score importer will not attempt to query the LiftWing API again to complete the features field.
Although we have implemented a retry strategy when querying APIs, prolonged downtime of one API could lead to many revisions remaining without complete data.
Open questions and concerns
Liftwing features keeps being stored in the
features
/features_previous
fields for wikidata wikis because they're not supported by the new reference-counter API, so references have to be calculated through liftwing features. However, I'm not sure if wikidata uses a completely different approach to count references. In that case, maybe wikidata doesn't need features at all.