Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Severe performance issues #114

Open
aspark21 opened this issue Oct 12, 2021 · 34 comments
Open

Severe performance issues #114

aspark21 opened this issue Oct 12, 2021 · 34 comments
Assignees
Labels
Needs backporting Nightmare For issues that are complex and horrible.

Comments

@aspark21
Copy link

Hi Gareth,

This capability checks is extremely problematic - it's occurring 100s of times per page load and taking 120s for that to load
https://github.com/gjb2048/moodle-format_topcoll/blob/MOODLE_39/classes/activity.php#L129-L173

In particular this method is probable issue:
private static function calulatecoursemodules($courseid, $students, $modid = null) {

especially this Line 757 - if (((method_exists($usermod, 'is_visible_on_course_page')) && ($usermod->is_visible_on_course_page()))

We're in CI mode as whole site went down earlier today, it seems other universities are also affected with performance issues on earlier versions of topcoll. We've had a long running item around coursemodinfo cache being problematic for some users for the last year or so. And had been pursuing https://tracker.moodle.org/browse/MDL-55231 - which is not quite ready but close. Though there's thinking needed about wether there is something about topcoll (or topcoll + adaptable) which could make this worse.

We should have Cat AU looped in overnight who may have some recommendations.

Al
P.S. will email you to bring you into the loop on this

@aspark21
Copy link
Author

Actually there's no data in this so posting the xhprof traces here:
course-view

course-editbulkcompletion

@gjb2048 gjb2048 self-assigned this Oct 12, 2021
@gjb2048
Copy link
Collaborator

gjb2048 commented Oct 12, 2021

Dear @aspark21,

I have just about had enough of this code (sick to the back teeth) that I've inherited from Adaptable and indeed within Adaptable itself. If its problematic, then please do turn it off.

The code in question, only follows what core wrote. Before that the answer was inaccurate and confusing. I see no other way of writing it (bar removing the method exists call).

Also, please could you test the M3.11 version (https://moodle.org/mod/forum/discuss.php?d=425903) as that now employs the use of the Moodle cache, and should be much faster after initial calculations.

Gareth

@aspark21
Copy link
Author

How do we turn it off?

we're on cd44254 which includes all the cache improvements of the 3.9.1.6 release on Moodle 3.9.10

@gjb2048
Copy link
Collaborator

gjb2048 commented Oct 12, 2021

Settings can be used: https://github.com/gjb2048/moodle-format_topcoll/blob/MOODLE_39/settings.php#L505-L589 - but that would mean resetting the default layout, probably needs rethinking that bit as a separation / better switch. So... comment out the code:

https://github.com/gjb2048/moodle-format_topcoll/blob/MOODLE_39/classes/course_renderer.php#L220-L234

and the events:

https://github.com/gjb2048/moodle-format_topcoll/blob/MOODLE_39/classes/observer.php

which could be better and react to the settings. EDIT, does react to the settings: https://github.com/gjb2048/moodle-format_topcoll/blob/MOODLE_39/classes/activity.php#L1230-L1243, so just need a better 'reset' to default for that setting as a single element / have the course setting follow the default as the other's now do. EDIT: Which it does https://github.com/gjb2048/moodle-format_topcoll/blob/MOODLE_39/lib.php#L774-L791, so now only need to separate into its own reset category so that courses not following the default can be reset to do so. i.e. when its changed to off.

@gjb2048
Copy link
Collaborator

gjb2048 commented Oct 12, 2021

Why does applying activity completion trigger a 'course_module_updated' event? I would have thought that would only happen when the teacher actually changed it.

@gjb2048
Copy link
Collaborator

gjb2048 commented Oct 13, 2021

Ok, improvements:

Separated out the reset so that its separate and easier to disable on a / all courses: a9fa4de (Don't apply this: a9fa4de#diff-8f5ff72ef1cf59199cfc4bbf16f2475d09e448601242f0d2c51bb94bd4aa1395R1572).

Removed 'method_exists': 8a0dfea

Only calculate the students once per course: 9a1afdc.

@gjb2048
Copy link
Collaborator

gjb2048 commented Oct 13, 2021

Small optimisation: 59aaff9

@gjb2048
Copy link
Collaborator

gjb2048 commented Oct 13, 2021

@aspark21 With the second flow, why is '\core\event\course_module_completion_updated' not used instead?

@gjb2048
Copy link
Collaborator

gjb2048 commented Oct 13, 2021

@aspark21 Ok, would a setting that prevented the activity information being calculated / shown if there was more than 'x' students on the course help?

@gjb2048 gjb2048 added Needs backporting Nightmare For issues that are complex and horrible. and removed Pending replication labels Oct 13, 2021
@aspark21
Copy link
Author

We turned off those settings yesterday morning and site performed ok. Did not comment out any code yet.

We're however still seeing a significant amount of cache invalidations so it seems there is still something touching the caches outside of those features.

The key here is the cache is invalidated very frequently(100s of times/hour), not just for that specific course but for all courses and the cache then needs to get rebuilt. It's disproportional to the amounts of actual course edits(90 in a day).

@gjb2048
Copy link
Collaborator

gjb2048 commented Oct 14, 2021 via email

@gjb2048
Copy link
Collaborator

gjb2048 commented Oct 14, 2021 via email

@gjb2048
Copy link
Collaborator

gjb2048 commented Oct 18, 2021

@aspark21 Any news on this please? At the moment I'm a bit in the dark, so can only operate a little on guesswork, I need more concrete facts / evidence.

@aspark21
Copy link
Author

Hi Gareth,

Yes, the team will provide the additional info shortly

@gjb2048
Copy link
Collaborator

gjb2048 commented Oct 18, 2021

Hi Alistair,

Thanks. The UCL team?

Gareth

@gjb2048
Copy link
Collaborator

gjb2048 commented Oct 18, 2021

@aspark21 Please note that the MOODLE_311 branch is being updated / improved in this area.

@gjb2048
Copy link
Collaborator

gjb2048 commented Oct 25, 2021

@aspark21 Been 7 days, any news on this please? I feel like a complete mushroom at the moment and working on pure guess work. I really need the events / stack traces for:

We're however still seeing a significant amount of cache invalidations so it seems there is still something touching the caches outside of those features.

The key here is the cache is invalidated very frequently(100s of times/hour), not just for that specific course but for all courses and the cache then needs to get rebuilt. It's disproportional to the amounts of actual course edits(90 in a day).

Then I might have a hope at fixing this for you!

@aspark21
Copy link
Author

Hi Gareth,

Sorry for the slow replies, been pretty tied up in the incident and looking for other causes.

We found the root cause for our incident to actually be https://tracker.moodle.org/browse/MDL-72837 - so I assume

I will rerun some stack traces this week now we've fixed the underlying flaw to see how this code is now performing.

Al

@gjb2048
Copy link
Collaborator

gjb2048 commented Oct 25, 2021

Dear @aspark21,

Thank you for getting back to me. I've now improved the code with a new admin setting 'courseadditionalmoddatamaxstudents' (you'll find this on the MOODLE_311 and master (for M4.0) branches), whereby if not zero, then additional information will be turned off if the number of students is above the number set. Of course, the number of students is still calculated, but there should be a performance increase as the rest isn't. The course settings will inform an editing teacher of the situation when this happens.

This is on top of all the other settings to turn elements on / off for this.

G

@gjb2048
Copy link
Collaborator

gjb2048 commented Nov 3, 2021

Dear @aspark21,

Any news on this please as its driving me nuts that I cannot make progress on this. I'm have a 'get the job done' attitude and this is stalled, which is very frustrating. If there is a bug to fix, then it needs fixing as I'll only then burn more duplicate time making more releases as other things make progress and this doesn't.

Gareth

@aspark21
Copy link
Author

Hi Gareth,

We won't be able to feedback on your improvements, as you only made the change in MOODLE_311 and not in MOODLE_39.

I've go an xhprof here of course/view.php now that we've fixed our underlying infrastructure issue. And you can see this is performing adequately. This is with the Additional Info switched off at Site level. The slow thing is calendar / course overview related, so nothing collapsed topics related & overall page load is not that slow.
callgraph-courseview-fixed-noadditional

I have re-enabled the Additional Info at site level just now, so will provide similar trace towards the end of the day, for comparison.

@gjb2048
Copy link
Collaborator

gjb2048 commented Nov 10, 2021 via email

@gjb2048
Copy link
Collaborator

gjb2048 commented Nov 10, 2021

@aspark21 But I have added the caching to the M3.9 version and released it! Please see point '2' for 'Version 3.9.1.6' on https://github.com/gjb2048/moodle-format_topcoll/blob/MOODLE_39/Changes.md.

@aspark21
Copy link
Author

That wasn't a change for us - we were on MOODLE_39 @ cd44254 which already had that change in. We've just updated to MOODLE_39 @ 2945065 which only had trivial changes (#105 and #111)

I was referring to the new 'courseadditionalmoddatamaxstudents' setting you mentionned #114 (comment)

@gjb2048
Copy link
Collaborator

gjb2048 commented Nov 11, 2021

@aspark21 Ah, ok, but what about this:

You:

"We're however still seeing a significant amount of cache invalidations so it seems there is still something touching the caches outside of those features."

"The key here is the cache is invalidated very frequently(100s of times/hour), not just for that specific course but for all courses and the cache then needs to get rebuilt. It's disproportional to the amounts of actual course edits(90 in a day)."

Me:

"Why does applying activity completion trigger a 'course_module_updated' event? I would have thought that would only happen when the teacher actually changed it."

"With the second flow, why is '\core\event\course_module_completion_updated' not used instead?"

"What events are causing those invalidations and what is generating them?"

You:

"Yes, the team will provide the additional info shortly"

Thus the 'courseadditionalmoddatamaxstudents' is outside of the event driven cache code that was already in the M3.9 version that I thought you were testing and I could possibly identify if I needed to raise a tracker issue for the 'completion' when it updates and generates the wrong event. The 'courseadditionalmoddatamaxstudents' setting only mitigates the issue but won't identify the "The key here is the cache is invalidated very frequently(100s of times/hour), not just for that specific course but for all courses and the cache then needs to get rebuilt. It's disproportional to the amounts of actual course edits(90 in a day)." issue I thought you were getting for me, please :).

@aspark21
Copy link
Author

For "We're however still seeing a significant amount of cache invalidations so it seems there is still something touching the caches outside of those features."

The reason for that was https://tracker.moodle.org/browse/MDL-72837 - An infrastructure & cache config combo that had a bug in core. So not Collapsed Topics related. The human way I explain this bug is that: "stacked should rebuild cache once & then copy from shared to local 90 times (number of frontends) but it isn't because bug means it rebuilds unnecessarily. single layer rebuilds once, stacked was rebuilding 900 times."

What we are still aware of is that editing any activity results in the full rebuild of the coursemodinfo cache which is a concern for large/very large courses (users x content) - there is improvements in https://tracker.moodle.org/browse/MDL-55231 which were going to make it into 4.0 but are now pushed back to 4.1; we're looking at backporting & reworking it for 3.9.

The events thing we assumed was students doing the manual ticking of activity completions. We didn't explore further as the root cause of the CI was elsewhere.

@gjb2048
Copy link
Collaborator

gjb2048 commented Nov 11, 2021

@aspark21 Ah ok, so its a distributed enterprise computing locking issue. Course editing won't result in a full rebuild of a cache, only the part pertaining to a given course.

@gjb2048
Copy link
Collaborator

gjb2048 commented Nov 11, 2021

Note to self: Investigate the event type generated by course completion.

@gjb2048
Copy link
Collaborator

gjb2048 commented Nov 11, 2021

@aspark21 Are you also saying that I've spent lots of time investigating this and it's transpired not to be a bug in my code but in fact to do with core code in relation to your enterprise structured Moodle system?

@aspark21
Copy link
Author

Any course editing does currently result in a full rebuild for a particular course. This will hopefully change in Moodle 4.1 to only updating the particular activity within the cache for that course.

I did say so earlier but apologies for the brevity which I guess didn't fully convey the meaning - #114 (comment)

So while I don't think there is a mysterious unidentified performance issue with other aspects of Collapsed Topics, there is definitely some issues with the additional activity info.

Now that we have a setup that isn't affected by the unnecessary rebuilds, I re-enabled all of the additional activity info on Wednesday afternoon.
This lead to one of the courses refusing to load. It has 7000 users (mostly students) in it. So the looping stuff is an issue. I will re-enable tomorrow now that we've updated. but I was hoping the 'courseadditionalmoddatamaxstudents' was going to help with that, in case we wanted to re-enable the additional info

@gjb2048
Copy link
Collaborator

gjb2048 commented Nov 11, 2021

@aspark21 I thought this issue pertained to the activity info and not 'other aspects'? What 'some issues with the additional activity info'????? Because as far as I can tell, I've written the caching and event mechanism as it should be to react to changes in the data that would make the generated activity data invalid. Its even granular enough to only perform what it needs to when a single module changes.

I know the code I've written takes more time, but it is using the core API and indeed actually produces accurate rather than false data.

Please provide the event information and evidence for "Any course editing does currently result in a full rebuild for a particular course. This will hopefully change in Moodle 4.1 to only updating the particular activity within the cache for that course." as I'm observing the module not course events and thus only regenerate what I need to. Unless you mean the 'rebuild' pertains to the core distributed cache issue you mentioned?

Time taken to read previous response and write this comment = 15 mins.

@gjb2048
Copy link
Collaborator

gjb2048 commented Nov 12, 2021

Note to self: Could possibly improve the user enrolment events to be user granular and store on a per module basis the list of users that make up a given count and therefore react accordingly to the different types of event rather than the cache clear ATM.

@aspark21
Copy link
Author

Reminder: This - #114 (comment) - was referring to other aspects. Which I've already explained wasn't Collapsed Topics related.
But you then asked "Are you also saying that I've spent lots of time investigating this and it's transpired not to be a bug in my code but in fact to do with core code in relation to your enterprise structured Moodle system?"
Which this was clarifying it wasn't wasted time - "So while I don't think there is a mysterious unidentified performance issue with other aspects of Collapsed Topics, there is definitely some issues with the additional activity info."

For "Any course editing does currently result in a full rebuild for a particular course. This will hopefully change in Moodle 4.1 to only updating the particular activity within the cache for that course."
I am referring to the coursemodinfo cache specifically Gareth, this is core Moodle behaviour.

I'm not questioning the accuracy of the additional mod info data, just it's ability to scale.
I see you've now backported the 'courseadditionalmoddatamaxstudents' into MOODLE_39. Our next code release is in mid-January now so will have to evaluate that then.

@gjb2048
Copy link
Collaborator

gjb2048 commented Dec 6, 2023

Will be additionally resolved by #145.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs backporting Nightmare For issues that are complex and horrible.
Projects
None yet
Development

No branches or pull requests

2 participants