Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Outreachy Round 27] Make RevisionScoreImporter import data from both LiftWing API and reference-counter API. #5594

Conversation

gabina
Copy link
Member

@gabina gabina commented Jan 23, 2024

What this PR does

This PR is part of the "Improve how Wiki Education Dashboard counts references added" project (read issue #5547).

The purpose of this PR is to modify the existing RevisionScoreImporter class to not only use the Lift Wing API, but also to hit the new reference-counter one. Notice that these APIs are not available for every wiki:

  • Lift Wing API only works for 1) wikidata and 2) wikipedia for articlequality model languages (en, eu, fa, fr, gl, nl, pt, ru, sv, tr, and uk).
  • reference-counter API works for every wiki, except for wikidata.

To abstract the logic related to deciding which API/APIs hit to get revisions data for a given wiki, this PR adds a new RevisionScoreApiHandler intermediate class that deal with it.

When merging and deploying this PR, features and features_previous fields in revisions table will store a hash with not only the data from Lift Wing API (if any), but also a num_ref key with the number of references got from the reference-counter API (if applicable).

Notice that even though this PR populates the features and features_previous fields with num_ref data, this it not used when calculating the references added in Revision.references_added method. So no change should be visible from the user's point of view.

Open questions and concerns

One idea to evaluate before deploying this is to make the process of starting to use the new reference-counter API more progressive. One easy way to do this could be to temporarily modify the valid_wiki? definition for the ReferenceCounterApi. This way we could enable this new API only for some specific wikis, for example es.wikipedia. That way we can monitor the new process taking less risk. After the updates of some courses, if everything works as expected, we can enable the ReferenceCounterApi for every wiki except for wikidata.

…unexpected error, the return value is a hash with fields: wp10, features, deleted, and prediction.
…case of unexpected error, the return value is a hash with num_ref field.
…ftWingApi/ReferenceCounterApi. This RevisionScoreApiHandler API takes care of the logic to decide which APIs are available for a given wiki, hits the APIS, and to merge the responses.
@gabina gabina marked this pull request as draft January 23, 2024 15:56
@gabina
Copy link
Member Author

gabina commented Jan 26, 2024

Failure doesn't seem to be related to my changes:

  1) Namespace-specific stats generates and renders stats for Cookbook namespace on en.wikibooks.org
     Got 0 failures and 2 other errors:

     1.1) Failure/Error: visit "/courses/#{course.slug}/manual_update"

          Net::ReadTimeout:
            Net::ReadTimeout with #<Socket:(closed)>
          # ./spec/features/namespace_stats_spec.rb:29:in `block (3 levels) in <top (required)>'
          # ./spec/features/namespace_stats_spec.rb:28:in `block (2 levels) in <top (required)>'

     1.2) Failure/Error: Capybara::Screenshot.new.screenshot_and_save_page if example.exception

          NoMethodError:
            undefined method `new' for Capybara::Screenshot:Module

                Capybara::Screenshot.new.screenshot_and_save_page if example.exception
                                    ^^^^
          # ./spec/rails_helper.rb:101:in `block (2 levels) in <top (required)>'

@gabina gabina marked this pull request as ready for review January 26, 2024 19:23
@gabina gabina changed the title [WIP] [Outreachy Round 27] Make RevisionScoreImporter import data from both LiftWing API and reference-counter API. [Outreachy Round 27] Make RevisionScoreImporter import data from both LiftWing API and reference-counter API. Jan 26, 2024
@gabina
Copy link
Member Author

gabina commented Jan 26, 2024

@ragesoss @Aminehassou this PR is ready for review. Feel free to review whenever you have the chance!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One idea to evaluate before deploying this is to make the process of starting to use the new reference-counter API more progressive. One easy way to do this could be to temporarily modify the valid_wiki? definition for the ReferenceCounterApi. This way we could enable this new API only for some specific wikis, for example es.wikipedia. That way we can monitor the new process taking less risk. After the updates of some courses, if everything works as expected, we can enable the ReferenceCounterApi for every wiki except for wikidata.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that is necessary. We can deploy this to https://dashboard-testing.wikiedu.org/ to try it out in a production-like environment and make sure everything seems to be working.

… retries the same request up to 5 times if it raises an error.
@gabina
Copy link
Member Author

gabina commented Feb 2, 2024

Note about why I added a retry strategy:

For the quality report, I went through the entire process of populating the features and features_previous fields using both APIs. I observed that the most frequent API error was "unable to solve domain," a transient issue that typically resolves itself upon retrying the request.

In the previous implementation, when faced with this temporary error, the features and features_previous fields were set to nil, with the expectation that they would likely be populated in the next automatic statistics update task.

However, in the new implementation, if this temporary error occurs, the features and features_previous fields won't be set to nil. For instance, it might fail the request to Toolforge but succeed with LiftWing. Consequently, these fields won't be populated in the subsequent UpdateCourseStats periodic task run.

To address this, I introduced a straightforward retry strategy. This ensures a higher likelihood of successfully populating the fields, even in the presence of temporary errors.

@ragesoss
Copy link
Member

@Aminehassou have you looked at this already, and if not would you like to? (If you don't want to, that's fine, just want to make sure you have a chance to review it if you were planning to.)

@Aminehassou
Copy link
Contributor

You can go ahead and review it, I've taken a look at it and it looks good!

@ragesoss
Copy link
Member

@Aminehassou thanks! it looks good to me as well, I'm going to merge and deploy it to staging so we can put it through a few paces before we go live.

@ragesoss ragesoss merged commit beb518b into WikiEducationFoundation:master Feb 15, 2024
1 check failed
@gabina gabina deleted the 5547-outreachy-round-27-use-new-reference-counter-api-in-revision-score-importer branch May 14, 2024 19:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants