Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(flags): add quota limiting for feature flags served up with /decide and to local_evaluation #28564

Open
wants to merge 26 commits into
base: master
Choose a base branch
from

Conversation

dmarticus
Copy link
Contributor

@dmarticus dmarticus commented Feb 11, 2025

Problem

Part 1 of 2 of this issue #28223, this change adds quota limiting to the /decide and /local_evaluation endpoints so that we stop serving flags to users who are not paying for them.

A few major things to note here: before I explain more

  • This quota limiting feature is gated behind an environment variable (right now just a boolean), which means that it is off by default until we turn it on. The reason I did this is because i don't just want to roll out quota limiting to everyone without also sharing documentation around it (and potentially reaching out to some power users directly); I don't want people to be shocked that their flag requests start to fail.
  • To that point, I'm also open to making the environment variables that control quota limiting be a list of team IDs (rather than a boolean), so that we can toggle them on or off certain teams rather than rolling this to everyone. I just wanted to get my idea out so folks could debate it, and this change is an easy one to make if folks want it.
  • Once we're aligned on my implementation, I'm gonna write the docs about this feature that we can point our users at.
  • This change won't go live until I've updated every SDK to handle the quota-limited response. Those changes will come next.

Changes

This change uses a lot of the existing infrastructure for quota limiting recordings, so not a lot of code needed to be changed. I just did the following:

  • added FEATURE_FLAGS_EVALUATED as a quota-limited resource
  • set the overage buffer (e.g. how many resources they're allowed to go over before our requests start erroring out) to zero (open to discussing this)
  • added a conditional block to /decide that, if we pass in an environment variable that turns on quota limiting, returns an quota-limited flag response
{
    "quotaLimited": ["feature_flags"],
    "featureFlags": {},
    "errorsWhileComputingFlags": True,
    "featureFlagPayloads": {},
}

I also refactored the logical flow in /decide a bit (which mostly consisted of me pulling out the flag computation logic into a separate function). This makes it IMO a bit easier to read, and it'll also make it easier for me to redirect the flag computation to a separate service when I'm testing the new service in production.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

This PR adds quota limiting functionality to the /decide endpoint to restrict feature flag access for teams exceeding usage limits, with controlled rollout capabilities.

  • Added DECIDE_FEATURE_FLAG_QUOTA_CHECK setting in /posthog/settings/web.py to control quota limiting rollout
  • Extracted flag computation into compute_feature_flags() in /posthog/api/decide.py for better maintainability
  • Added FEATURE_FLAGS to QuotaResource enum in /ee/billing/quota_limiting.py with zero overage buffer
  • Returns structured response with empty flags when quota limited: {"quotaLimited": ["feature_flags"], "featureFlags": {}}
  • Added comprehensive test coverage in /posthog/api/test/test_decide.py for quota limiting scenarios

5 file(s) reviewed, 1 comment(s)
Edit PR Review Bot Settings | Greptile

ee/billing/test/test_quota_limiting.py Outdated Show resolved Hide resolved
@@ -56,6 +57,7 @@ class QuotaLimitingCaches(Enum):
QuotaResource.EVENTS: 0,
QuotaResource.RECORDINGS: 1000,
QuotaResource.ROWS_SYNCED: 0,
QuotaResource.FEATURE_FLAGS: 0, # TODO: should we have a buffer here?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to calling this feature_flags_evaluated instead, that might be more clear (I like how rows_synced is more obvious than just events.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think FEATURE_FLAGS_EVALUATED makes sense. That way it's not confused with how many feature flags a team has, which I assume is NOT quota limited?

Comment on lines -318 to -348
active_flags = {key: value for key, value in feature_flags.items() if value}

if api_version == 2:
flags_response["featureFlags"] = active_flags
elif api_version >= 3:
# v3 returns all flags, not just active ones, as well as if there was an error computing all flags
flags_response["featureFlags"] = feature_flags
flags_response["errorsWhileComputingFlags"] = errors
flags_response["featureFlagPayloads"] = feature_flag_payloads
else:
# default v1
flags_response["featureFlags"] = list(active_flags.keys())

# metrics for feature flags
team_id_label = label_for_team_id_to_track(team.pk)
FLAG_EVALUATION_COUNTER.labels(
team_id=team_id_label,
errors_computing=errors,
has_hash_key_override=bool(data.get("$anon_distinct_id")),
).inc()

if is_request_sampled_for_logging:
logger.warn(
"DECIDE_REQUEST_SUCCEEDED",
team_id=team.id,
distinct_id=distinct_id,
errors_while_computing=errors or False,
has_hash_key_override=bool(data.get("$anon_distinct_id")),
)
else:
flags_response["featureFlags"] = {}
Copy link
Contributor Author

@dmarticus dmarticus Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code was all moved to get_feature_flags_response

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish GitHub supported semantic diffs to make it easier to review changes like this. 😆

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that'd be so nice. Sometimes I find the side-by-side view to help with it.

Comment on lines 354 to 365
# NOTE: Whenever you add something to decide response, update this test:
# `test_decide_doesnt_error_out_when_database_is_down`
# which ensures that decide doesn't error out when the database is down

if feature_flags:
# Billing analytics for decide requests with feature flags
# Don't count if all requests are for survey targeting flags only.
if not all(flag.startswith(SURVEY_TARGETING_FLAG_PREFIX) for flag in feature_flags.keys()):
# Sample no. of decide requests with feature flags
if settings.DECIDE_BILLING_SAMPLING_RATE and random() < settings.DECIDE_BILLING_SAMPLING_RATE:
count = int(1 / settings.DECIDE_BILLING_SAMPLING_RATE)
increment_request_count(team.pk, count)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code was also moved to compute_feature_flags, since it's only relevant if we actually evaluated flags.

@@ -60,6 +60,9 @@
# if `true` we disable session replay if over quota
DECIDE_SESSION_REPLAY_QUOTA_CHECK = get_from_env("DECIDE_SESSION_REPLAY_QUOTA_CHECK", False, type_cast=str_to_bool)

# if `true` we disable feature flags if over quota
DECIDE_FEATURE_FLAG_QUOTA_CHECK = get_from_env("DECIDE_FEATURE_FLAG_QUOTA_CHECK", False, type_cast=str_to_bool)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could be an array of team IDs to either explicit limit or explicitly not limit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, why wouldn't we control this with our own feature flag?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we could! I was just following the pattern of the other environment variable controls. We've generally avoided using feature flags that are evaluated within /decide itself, though (this variable is passed into get_decide.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've generally avoided using feature flags that are evaluated within /decide itself

Ha! I guess that could get recursive. I'm not suggesting making this change. Just curious.

@dmarticus dmarticus requested a review from a team February 11, 2025 20:55
@@ -56,6 +57,7 @@ class QuotaLimitingCaches(Enum):
QuotaResource.EVENTS: 0,
QuotaResource.RECORDINGS: 1000,
QuotaResource.ROWS_SYNCED: 0,
QuotaResource.FEATURE_FLAGS: 0, # TODO: should we have a buffer here?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think FEATURE_FLAGS_EVALUATED makes sense. That way it's not confused with how many feature flags a team has, which I assume is NOT quota limited?

@@ -623,8 +646,9 @@ def update_all_orgs_billing_quotas(
for org_id in orgs_with_changes:
properties = {
"quota_limited_events": quota_limited_orgs["events"].get(org_id, None),
"quota_limited_recordings": quota_limited_orgs["events"].get(org_id, None),
"quota_limited_recordings": quota_limited_orgs["recordings"].get(org_id, None),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a bug fix? Has recordings quota limited been wrong this whole time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh wow I didn't even realize i did this; it was a cursor tab-complete for adding in the quota_limited_flags stuff. I'm gonna tag the replay team so they know about this, since i don't want to change it without them signing off, but it's also possible they weren't even aware this was broken. @PostHog/team-replay you folks want to take a look at this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙈 that wouldn't suprise me.

i've never touched quota limiting and don't know how I'd check if it was broken 🤔

more scary, there were no failing tests before or after the change 😱

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

turns out it's been that way since being introduced

#14340

that PR talks about tracking so is this just what we're reporting to ourselves or the actual value we use for limiting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol I wonder if it's because the one test where we check the value of that entry we also assert that it's the same value of the quota_limited_events field.

assert org_action_call.kwargs.get("properties") == {
    "quota_limited_events": 1612137599,
    "quota_limited_recordings": 1612137599,
    "quota_limited_rows_synced": None,
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙈

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you nailed it though; that object is just used in report_organization_action, so it's just for internal reporting

    for org_id in orgs_with_changes:
        properties = {
            "quota_limited_events": quota_limited_orgs["events"].get(org_id, None),
            "quota_limited_recordings": quota_limited_orgs["recordings"].get(org_id, None),
            "quota_limited_rows_synced": quota_limited_orgs["rows_synced"].get(org_id, None),
            "quota_limited_feature_flags": quota_limited_orgs["feature_flags"].get(org_id, None),
        }

        report_organization_action(
            orgs_by_id[org_id],
            "organization quota limits changed",
            properties=properties,
            group_properties=properties,
        )

So, no actual quota limiting bug, but a quota limit reporting bug!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you ok to fix the test as well here?
i don't understand the impact but i'm guessing quota limiting on the correct count is better than not regardless of if it's just reporting

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah i'm on it; there's a bunch of other tests I want to fix too – I need to assert the presence of this new quota limit in way more places lol

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome... thanks!

Comment on lines -318 to -348
active_flags = {key: value for key, value in feature_flags.items() if value}

if api_version == 2:
flags_response["featureFlags"] = active_flags
elif api_version >= 3:
# v3 returns all flags, not just active ones, as well as if there was an error computing all flags
flags_response["featureFlags"] = feature_flags
flags_response["errorsWhileComputingFlags"] = errors
flags_response["featureFlagPayloads"] = feature_flag_payloads
else:
# default v1
flags_response["featureFlags"] = list(active_flags.keys())

# metrics for feature flags
team_id_label = label_for_team_id_to_track(team.pk)
FLAG_EVALUATION_COUNTER.labels(
team_id=team_id_label,
errors_computing=errors,
has_hash_key_override=bool(data.get("$anon_distinct_id")),
).inc()

if is_request_sampled_for_logging:
logger.warn(
"DECIDE_REQUEST_SUCCEEDED",
team_id=team.id,
distinct_id=distinct_id,
errors_while_computing=errors or False,
has_hash_key_override=bool(data.get("$anon_distinct_id")),
)
else:
flags_response["featureFlags"] = {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish GitHub supported semantic diffs to make it easier to review changes like this. 😆

@@ -60,6 +60,9 @@
# if `true` we disable session replay if over quota
DECIDE_SESSION_REPLAY_QUOTA_CHECK = get_from_env("DECIDE_SESSION_REPLAY_QUOTA_CHECK", False, type_cast=str_to_bool)

# if `true` we disable feature flags if over quota
DECIDE_FEATURE_FLAG_QUOTA_CHECK = get_from_env("DECIDE_FEATURE_FLAG_QUOTA_CHECK", False, type_cast=str_to_bool)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, why wouldn't we control this with our own feature flag?

@haacked
Copy link
Contributor

haacked commented Feb 11, 2025

Whoops, I meant to add some comments to my review. It looks good to me, but obviously I'm still learning the system.

This gives me an idea for a possible future feature. A "rate-limited" feature flag. Basically a feature flag with a counter per group and when it's hit, the flag returns false.

Copy link
Contributor

@zlwaterfield zlwaterfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a lot of context on the decide/flag stuff but the quote limiting stuff is good 👍 . Nice fix on the recordings issue.

QuotaResource.FEATURE_FLAGS, QuotaLimitingCaches.QUOTA_LIMITER_CACHE_KEY
)
if self.team.api_token in limited_tokens_flags:
return Response(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be responding with an error here so they know they are being limited?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I was on the fence about this too – I know we can't fail decide because of quota limiting and instead have to just return a different permutation of the decide response, maybe with local eval I can be more fail-y – I haven't really looked at the SDKs to see how they handle it but we already are pretty happy to fail local eval with a 429 if we hit the customer rate limits, perhaps it's worth doing failing for a quota limit, too. Good take. Let me try this and see how it feels.

@dmarticus dmarticus changed the title feat(flags): add quota limiting for feature flags served up with /decide feat(flags): add quota limiting for feature flags served up with /decide and to local_evaluation Feb 13, 2025
Comment on lines -358 to -365
if feature_flags:
# Billing analytics for decide requests with feature flags
# Don't count if all requests are for survey targeting flags only.
if not all(flag.startswith(SURVEY_TARGETING_FLAG_PREFIX) for flag in feature_flags.keys()):
# Sample no. of decide requests with feature flags
if settings.DECIDE_BILLING_SAMPLING_RATE and random() < settings.DECIDE_BILLING_SAMPLING_RATE:
count = int(1 / settings.DECIDE_BILLING_SAMPLING_RATE)
increment_request_count(team.pk, count)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this, and other logging, is moved into _record_feature_flag_metrics

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants