Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable URL Metric storage locking by default for administrators #1835

Merged
merged 13 commits into from
Feb 4, 2025

Conversation

westonruter
Copy link
Member

@westonruter westonruter commented Jan 29, 2025

Fixes #1829

This changes the default value passed into the od_url_metric_storage_lock_ttl from 60 to 0 when the current user is an administrator. Additionally, a new od_store_url_metric_now meta capability is introduced which maps to manage_options by default. This gives sites the ability to customize which users have access to that capability.

In practice this will eliminate the need for the following code in the Optimization Detective Dev Mode plugin as long as the developer is testing while being logged-in as an administrator:

// During development, this can be useful to set to zero so that you don't have to wait for new URL Metrics to be requested when engineering a new optimization.
add_filter(
	'od_url_metric_freshness_ttl',
	static function () {
		return 0;
	}
);

@westonruter westonruter added [Type] Enhancement A suggestion for improvement of an existing feature [Plugin] Optimization Detective Issues for the Optimization Detective plugin labels Jan 29, 2025
Copy link

codecov bot commented Jan 29, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 65.92%. Comparing base (2fa59f0) to head (2c5ea50).
Report is 20 commits behind head on trunk.

Additional details and impacted files
@@            Coverage Diff             @@
##            trunk    #1835      +/-   ##
==========================================
+ Coverage   65.86%   65.92%   +0.05%     
==========================================
  Files          88       88              
  Lines        6873     6885      +12     
==========================================
+ Hits         4527     4539      +12     
  Misses       2346     2346              
Flag Coverage Δ
multisite 65.92% <100.00%> (+0.05%) ⬆️
single 38.22% <76.92%> (-0.23%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@westonruter westonruter marked this pull request as ready for review January 29, 2025 22:45
Copy link

github-actions bot commented Jan 29, 2025

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

If you're merging code through a pull request on GitHub, copy and paste the following into the bottom of the merge commit message.

Co-authored-by: westonruter <[email protected]>
Co-authored-by: felixarntz <[email protected]>

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

@westonruter westonruter changed the title Disable URL Metric storage locking for administrators Disable URL Metric storage locking by default for administrators Jan 29, 2025
@felixarntz
Copy link
Member

@westonruter I'm not sure about this change. This seems to be something that helps development, so I think having it in the OD Dev Mode plugin makes more sense to me than making it the default for any administrator. Many administrators of production sites are probably not developers.

On another note, I think the code snippet referenced in the PR description is for something else than described (it mentions viewport, rather than storage lock)?

@westonruter
Copy link
Member Author

On another note, I think the code snippet referenced in the PR description is for something else than described (it mentions viewport, rather than storage lock)?

Oops. I copied the wrong code. This has been corrected.

I'm not sure about this change. This seems to be something that helps development, so I think having it in the OD Dev Mode plugin makes more sense to me than making it the default for any administrator. Many administrators of production sites are probably not developers.

Actually, it's not specifically to help development. If someone creates a new post and then views it on the frontend, this should result in a URL Metric being collected right away without having to wait for any TTL storage lock to expire for the author or for another visitor to come by to view the post. A post author will almost always want to view the published post on the frontend after publishing, so this will result in a URL Metric being stored right away to give the collection a head start. Additionally, an administrator browsing around their site should be able to submit URL Metrics as they go around without having to wait for the storage lock TTL to expire. The purpose of the storage lock is to guard against unlimited DB writes from a frontend visitor. This, however, is not relevant to administrators (or even other roles above subscriber) since they can write to the DB at will in the admin.

Actually, come to think of it, manage_options probably isn't the right capability here, now that I think about it. In reality it should be edit_posts. It's about whether or not a user has "write access" to the database.

@felixarntz
Copy link
Member

Your rationale makes sense to me. In that case, maybe a better implementation to achieve that goal would be to invalidate the current user's storage lock transient whenever a post is updated? That way, any fresh content would be able to receive updates immediately, without e.g. "blasting" the server with tons of URL metrics (in case of a very unpopular site, where the administrator visits e.g. 15 URLs quickly to test stuff).

@westonruter
Copy link
Member Author

I'm not sure that would work. The storage lock TTL is sent to the client to prevent the client from even attempting to make a POST request to store a new URL Metric.

I don't see a problem with allowing an administrator to make multiple DB writes quickly. Consider bulk actions in the admin or quickly moderating through comments, for example.

@westonruter
Copy link
Member Author

Oh, I just realized this won't work as-is since the _wpnonce isn't being included for logged-in users, so the endpoint at present would always see the user as anonymous. Easy to fix.

@westonruter
Copy link
Member Author

Oh, I just realized this won't work as-is since the _wpnonce isn't being included for logged-in users, so the endpoint at present would always see the user as anonymous. Easy to fix.

There, this is fixed by 9f89cd4. Note that this could eventually mitigate the need for the Site Health test to check whether anonymous REST API requests are allowed (#1731). We could potentially add a new hook which disables URL Metric collection for anonymous users. This could be attractive to high-traffic sites (cf. #1655) which don't want to expose the REST API to anonymous visitors. Such sites would require logged-in users to collect the URL Metrics by browsing around, but also some automated collection could be implemented in the background, such as by opening the frontend in an IFRAME in the background when saving the post (#1311).

@westonruter
Copy link
Member Author

@felixarntz Anything else you want to see here?

@felixarntz
Copy link
Member

@westonruter Sorry I didn't get to responding back to this sooner.

I'm not sure that would work. The storage lock TTL is sent to the client to prevent the client from even attempting to make a POST request to store a new URL Metric.

But if the storage lock was cleared, the lock wouldn't be there anymore until they send a URL metric again, correct? You mentioned previously as a rationale for this change:

If someone creates a new post and then views it on the frontend, this should result in a URL Metric being collected right away without having to wait for any TTL storage lock to expire for the author or for another visitor to come by to view the post.

For that reason, I suggested to clear the TTL for the current user when a post is updated. If they then visit the post in the frontend, URL metric data for that post would be sent, no? So I'm thinking that would satisfy this use-case without making broader changes as proposed here.

I don't see a problem with allowing an administrator to make multiple DB writes quickly. Consider bulk actions in the admin or quickly moderating through comments, for example.

To me it's less about "allowing" than about the implications on low-quality servers. The examples you're mentioning are still actual actions taking by the user, so there's naturally more of a limitation on how fast the user can perform those. There's a slightly different implication on simply visiting pages, and the administrator (in many cases probably) not knowing that each page load will be a database write.

FWIW I'm not strongly opposed to this change, but based on what you described as rationale, I'm thinking there may be a less risky way to make it possible, as outlined above.

@westonruter
Copy link
Member Author

Storage locks are not bound to specific users. They are bound to a hash of the IP address. So if that lock were to be cleared, then anyone from that IP address could then submit a URL Metric, regardless of whether they are an admin or not.

Even in the case of an admin, however, clearing the storage lock wouldn't necessarily help because the storage lock TTL is still 60 seconds. This lock is stored client-side in sessionStorage to circumvent the user from needlessly attempting POST requests when they should be locked.

Lastly, if the storage lock were only cleared in response to a post update in the DB, then this wouldn't address the goal of an administrator being able to cause URL Metrics to be collected just by browsing the site. Let's say a user goes to their site, goes to create a post and publishes it. Once they do so and view the post on the frontend, a URL Metric should be collected. If the storage lock were cleared in response to editing a post, then good, the URL Metric should successfully be collected. But what if they then go to the homepage to see that post showing up there? The URL Metric wouldn't be collected because the storage lock would be re-imposed. If, however, the storage lock is disabled by default for admins, then URL Metrics will be collected as needed while navigating around the site without restriction. Given that DB writes happen all the time for admin users, I don't see this as an issue. DB writes happen for admins just by visiting the New Post screen and then they happen in the background for autosaves for that screen too.

@felixarntz
Copy link
Member

Storage locks are not bound to specific users. They are bound to a hash of the IP address. So if that lock were to be cleared, then anyone from that IP address could then submit a URL Metric, regardless of whether they are an admin or not.

Good point, I didn't consider that - of course not guaranteed that the IP address is the same person. Fair point also on the sessionStorage consideration.

Lastly, if the storage lock were only cleared in response to a post update in the DB, then this wouldn't address the goal of an administrator being able to cause URL Metrics to be collected just by browsing the site.

I see how that can be valuable, but I'm still wary of unexpected side effects on scalability, depending on a few factors:

  • The administrator sending a database write for more or less every URL they view on the site is probably fine, as it shouldn't have performance implications. Likely 90+% of WordPress sites have 1 administrator, so that'd be okay.
  • You mentioned above though that you would like to change the capability required to edit_posts. In that scenario, the implications would change - I would specifically become worried about e.g. certain membership site setups where it wouldn't be too far fetched to give 1000s of users edit_posts. Probably still would be a specific scenario, but in any case, it certainly changes the impact that this change would have.
  • There is a question on the diversity of URL metrics. Does it matter or not? Maybe it's fine for the vast majority of URL metrics to come from a single user - the viewport implications for the same viewport shouldn't differ wildly. But I want to make sure we consider that before going the route. Because effectively, for sites with very few to no users, implementing this change might lead to a similar result as if we had e.g. some admin UI to automatically visit all URLs as the authenticated user (which we discussed before, but never went for).

@westonruter
Copy link
Member Author

Good points.

  • You mentioned above though that you would like to change the capability required to edit_posts. In that scenario, the implications would change - I would specifically become worried about e.g. certain membership site setups where it wouldn't be too far fetched to give 1000s of users edit_posts. Probably still would be a specific scenario, but in any case, it certainly changes the impact that this change would have.

Yeah, I withdraw my idea to have the capability be edit_posts by default for that reason as well.

  • There is a question on the diversity of URL metrics. Does it matter or not? Maybe it's fine for the vast majority of URL metrics to come from a single user - the viewport implications for the same viewport shouldn't differ wildly. But I want to make sure we consider that before going the route. Because effectively, for sites with very few to no users, implementing this change might lead to a similar result as if we had e.g. some admin UI to automatically visit all URLs as the authenticated user (which we discussed before, but never went for).

This is something I thought about too, where it would be nice if a given user only was allowed to contribute one fresh URL Metric for a given URL. However, in order to do this, we'd have to store some identifier about the user session in the URL Metric to know whether or not we should reject additional URL Metrics from that client. This would be problematic from a privacy perspective, since then the URL Metrics would essentially become a way to track users.

We could possibly have a flag in the URL Metric for whether it was was collected from an authenticated user or not (e.g. userAuthenticated boolean property), which could be useful as a way possibly to differentiate trusted vs untrusted URL Metrics in the future. This flag could also be used to try to ensure that there is only one such fresh userAuthenticated URL Metric collected at a time. But what if it is a membership site in which all users are authenticated? That won't work then. Could we store the authenticated userId in the URL Metric and only allow one fresh URL Metric to be stored at a given time from that user (if it is not zero or null)? This goes back to the privacy risk.

For the sake of diversity of URL Metric collection, if an anonymous user visits the homepage, clicks to view a post for a minute, then goes back to the homepage, and then reads another post for a minute, before going back to the homepage for a third time: this will result in 3 URL Metrics being collected from that same user. Ideally we wouldn't collect multiple URL Metrics from the same user, but this isn't being thwarted currently anyway. I think it's more important for URL Metrics to be collected from the same user over the alternative of not collecting any URL Metric at all.

Nevertheless, here's an alternative approach extending on top of what this PR currently implements. What if we use sessionStorage to keep track of the URLs which the client had submitted URL Metrics for? When detect.js boots up it can check if the currentUrl was already submitted, and if so, short-circuit. This addresses the privacy concern of storing some identifier about the user session with the URL Metric while also ensuring a diversity of clients being the source for the submitted URL Metrics. SG?

@westonruter
Copy link
Member Author

@felixarntz There, what do you think about a57e080? The hashing of the URL in the sessionStorage may be overkill.

@felixarntz
Copy link
Member

  • There is a question on the diversity of URL metrics. Does it matter or not? Maybe it's fine for the vast majority of URL metrics to come from a single user - the viewport implications for the same viewport shouldn't differ wildly. But I want to make sure we consider that before going the route. Because effectively, for sites with very few to no users, implementing this change might lead to a similar result as if we had e.g. some admin UI to automatically visit all URLs as the authenticated user (which we discussed before, but never went for).

I didn't even think that far - so currently, this PR would result in an administrator being able to send all URL metrics necessary to fill a group? That would indeed be undesirable then, since e.g. refreshing the page a few times would already result in an entirely populated group (based on sample size). I guess I hadn't thought of that scenario yet, but that makes the concern even greater.

Nevertheless, here's an alternative approach extending on top of what this PR currently implements. What if we use sessionStorage to keep track of the URLs which the client had submitted URL Metrics for? When detect.js boots up it can check if the currentUrl was already submitted, and if so, short-circuit. This addresses the privacy concern of storing some identifier about the user session with the URL Metric while also ensuring a diversity of clients being the source for the submitted URL Metrics. SG?

There, what do you think about a57e080? The hashing of the URL in the sessionStorage may be overkill.

This would address the above concern right? Looks like a great solution. I think it's okay that a single administrator can provide a single metric per group per URL, so with this additional commit in place this should be a good approach.

@westonruter
Copy link
Member Author

Yes, the current state of the PR prevents any client (either an admin or not) from filling up the sample of URL Metrics. Previously, an anonymous user could have done so by waiting 60 seconds between page reloads, and initially this PR allowed an admin to do so by reloading multiple times even without waiting. Now only one URL Metric will be submitted for both. The difference is that when navigating on to a second URL, the admin will be able to submit a URL Metric for that one right away, whereas for the anonymous user they'll have to wait 60 seconds.

Copy link
Member

@felixarntz felixarntz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@westonruter This looks good to me now, just one small note that would be good to address. It's doing a bit too much in the map_meta_cap callback.

@westonruter
Copy link
Member Author

@felixarntz I added one more improvement in 78a266e which accounts for the case where another plugin may add a secondary capability to require for users who can od_store_url_metric_now. This additional commit will prevent clobbering any such additional caps.

@felixarntz
Copy link
Member

felixarntz commented Feb 4, 2025

@felixarntz I added one more improvement in 78a266e which accounts for the case where another plugin may add a secondary capability to require for users who can od_store_url_metric_now. This additional commit will prevent clobbering any such additional caps.

I don't think we should do that: Since od_store_url_metric_now is a meta capability based on the current implementation, other developers are not able to grant their own capabilities based on that capability.

If we want to make that possible, we need to instead handle od_store_url_metric_now as a base capability. That can be done via the user_has_cap filter (instead of map_meta_cap). Come to think of it, that's probably actually more appropriate for this capability, since there is nothing "meta" about it (e.g. there's no extra parameter to provide that the capability would depend on).

If you agree with that, we could simply solve it like this:

add_filter(
	'user_has_cap',
	function ( $allcaps ) {
		if ( isset( $allcaps['manage_options'] ) ) {
			$allcaps['od_store_url_metric_now'] = $allcaps['manage_options'];
		}
		return $allcaps;
	}
);

FWIW, even WordPress Core has historically been doing it wrong for a bunch of cases - for some reason, map_meta_cap is widely referenced as the way to do it, but in many cases user_has_cap would actually be more appropriate.

@westonruter
Copy link
Member Author

westonruter commented Feb 4, 2025

I don't think we should do that: Since od_store_url_metric_now is a meta capability based on the current implementation, other developers are not able to grant their own capabilities based on that capability.

I'm not sure I understand. A meta capability maps to one or more primitive capabilities. Developers can add an arbitrary number of additional primitive capabilities to be required for a given meta capability. For example, to prevent a user from being able to do something, a developer can just push the do_not_allow capability into the list of primitive caps that are returned by map_meta_cap(). For current_user_can(), the user must have all of the capabilities returned by map_meta_cap().

In any case, I reverted my commit.

Come to think of it, that's probably actually more appropriate for this capability, since there is nothing "meta" about it (e.g. there's no extra parameter to provide that the capability would depend on).

I don't think that the presence of args is really necessary for map_meta_cap to be used. For example, edit_categories and delete_post_tags always map to manage_categories. The customize cap always maps to edit_theme_options.

Update: After discussing offline, I am largely won over and I've replaced map_meta_cap with user_has_cap.

@felixarntz
Copy link
Member

@westonruter We chatted about this via DM, but replying here too for visibility:

I don't think we should do that: Since od_store_url_metric_now is a meta capability based on the current implementation, other developers are not able to grant their own capabilities based on that capability.

I'm not sure I understand. A meta capability maps to one or more primitive capabilities. Developers can add an arbitrary number of additional primitive capabilities to be required for a given meta capability.

Yes, but if a developer wanted to rely their own meta capability on od_store_url_metric_now granted via map_meta_cap, they would need to make a nested call to map_meta_cap(), which is unnecessarily complex and slow. They only don't have to do it if we include a workaround like you had in the extra commit, but that's also unnecessarily complex - and almost no plugin does that, it's not something safe to rely on.

Come to think of it, that's probably actually more appropriate for this capability, since there is nothing "meta" about it (e.g. there's no extra parameter to provide that the capability would depend on).

I don't think that the presence of args is really necessary for map_meta_cap to be used. For example, edit_categories and delete_post_tags always map to manage_categories. The customize cap always maps to edit_theme_options.

All of these capabilities should really be primitive capabilities. They were probably made to use map_meta_cap, since almost all documentation only focuses on map_meta_cap, when in reality many capabilities would be better suited for user_has_cap. Consider for instance edit_categories: It would not be unreasonable if there was a edit_category meta capability expecting a term ID. Mapping that to edit_categories would be unnecessarily complicated the way it's implemented in Core today.

Obviously most of this can be worked around, but distinguishing between primitive and meta capabilities helps other developers using a plugin to be able to finetune how the (primitive) capabilities are granted.

Copy link
Member

@felixarntz felixarntz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@westonruter Thanks for bearing with me through the discussions and clarifying your points, this looks good to me now.

Just one non-blocking comment.

Comment on lines +52 to +54
if ( ! is_array( $allcaps ) ) {
$allcaps = array();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point to include this, although technically someone changing this to not be an array would be doing_it_wrong.

It's fine to keep this, but if we already cater for it, I wonder whether we should instead just return the same value in such a scenario. It's basically broken, and we know the condition below won't be satisfied anyway.

If some code does something unexpected, I'd rather just ignore it and pass through, than tamper with what they did wrong and possibly cause more unexpected issues. :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do this in many other places actually. It's all for the sake of type safety, as otherwise a fatal error would occur since the function returns an array. This scenario happens when another plugin incorrectly returns the wrong type on a filter callback.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant it mostly as: If someone else returns something other than an array, we could also return that same thing. The documented filter input type is basically always the same as the documented filter output type, so from that perspective we could also change the return type here to array<string, bool>|mixed and pass through any value that's not array<string, bool>.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, although that would require removing the PHP type hint from the function. All of the other filter callbacks in the repo have taken this approach, so if we want to pass through the value unchanged, then all the other callbacks should be updated.

In any case, this is a safeguard for an edge condition for the sake of type safety.

@westonruter westonruter merged commit e793289 into trunk Feb 4, 2025
18 checks passed
@westonruter westonruter deleted the remove/storage-lock-for-admins branch February 4, 2025 20:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Plugin] Optimization Detective Issues for the Optimization Detective plugin [Type] Enhancement A suggestion for improvement of an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Storage lock TTL can be reduced or eliminated for admin users
2 participants