Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the crawl condition when continuing the loop early #178

Open
wants to merge 1 commit into
base: MOODLE_310_STABLE
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions classes/robot/crawler.php
Original file line number Diff line number Diff line change
Expand Up @@ -556,6 +556,7 @@ public function process_queue($verbose = false) {

// Get a new zero second timeout lock for the resource.
if (!$lock = $lockfactory->get_lock($resource, 0)) {
$hastime = time() < $cronstop; // Check if we need to break from processing
continue; // Try crawl the next node, this one is already being processed.
}

Expand All @@ -567,6 +568,7 @@ public function process_queue($verbose = false) {
$persistent = new url(0, $node);
$persistent->update();
$lock->release();
$hastime = time() < $cronstop; // Check if we need to break from processing
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect the flawed logic might be in this loop specifically.

since when stracing they seemed to be spinning calling update and then checking the lock advisory table.

Copy link
Author

@catalystfd catalystfd Jul 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it could be specifically courses that have a lastcrawled of null, they are always eligible for queue selection.

And setting the needscrawl to lastcrawled will simply set it to null, which means they will always be eligible for queue selection even if the course is never in recent courses, and this would be an infinite loop

and if all the available cron processes are tied up running this loop, there will never be time/space to update the needscrawl field.

I think this change is still good, but perhaps we also need to skip this entire if block if the nodes lastcrawled is null.

continue;
}

Expand Down