Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(http-ratelimiting): add a bucket for global rate limits #2159

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

greaka
Copy link

@greaka greaka commented Feb 19, 2023

This closes #650.

I opted to keep the rate limit trait as is for now, even though is_globally_locked is unused.
Additionally, the global rate limiter waits for each request to return before accepting the next request. Since 50/s isn't that many, I wanted to keep it simple. If you disagree, then I would change that.
Lastly, Discord only documents the hardcoded 50 requests per second. Nirn-proxy however seemed to have figured out that big bots have different global rate limits. We discussed this on Discord and decided not to support this due to it not being officially documented. EDIT: It was requested to add the option to instead specify custom ratelimit values manually.

@github-actions github-actions bot added c-http-ratelimiting Affects the http ratelimiting crate t-feature Addition of a new feature labels Feb 19, 2023
@vilgotf vilgotf self-requested a review February 19, 2023 18:50
@vilgotf
Copy link
Member

vilgotf commented Feb 19, 2023

Let's ignore the failing http tests for now.

Copy link
Member

@zeylahellyer zeylahellyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great overall, thank you! I've got concerns about the current implementation of the global bucket loop being a bottleneck, and I think fixing that concern can still cause ratelimit headers to be reached. I've added some scenarios where I think this implementation can cause global ratelimits to be encountered, and I've proposed a solution to prevents that possibility.


use self::bucket::{Bucket, BucketQueueTask};
pub use self::global_bucket::GlobalBucket;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be public? I don't see it being able to be returned or used by this ratelimiter implementation. If this is something we expect people to use for building other ratelimiters then we should move this outside of the in_memory module to the root.

let global_ticket_tx = if let Ok(ticket_tx) = self.wait_for_global().await {
ticket_tx
} else {
continue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this shouldn't happen. The global bucket lives at least as long as this task since each queue has a strong reference to it.

But, let's say a panic happens in the task driving the global bucket, such as an arithmetic panic (eg divide by 0) or a panic in some underlying dependency, such as an undocumented panic when calling tokio::time:sleep. When that happens, wait_for_bucket will only ever return an error since there's no receiving end to send a TicketSender back. When this happens, we continue the loop, and the cycle starts again. All requests to the ratelimiter from outside parties will start to just receive a mysterious error that the ratelimiter dropped the request, since they're never getting the sending channel back (by calling queue_tx.available() here) and the sending half of the channel gets dropped.

This is kind of not great, so we have two options: explicitly drop the queue and "stop" the ratelimiter and return an appropriate error to fail all requests, or allow all requests to pass through without a global ratelimiter. The second is more shocking, so we should probably default to that, but make the behavior customizable as a method on the ratelimiter.

Comment on lines 72 to 118
async fn run_global_queue_task(bucket: Arc<InnerGlobalBucket>, period: u64, requests: u32) {
let mut time = Instant::now();

while let Some(queue_tx) = bucket.queue.pop().await {
wait_if_needed(bucket.as_ref(), &mut time, period, requests).await;

let ticket_headers = if let Some(ticket_headers) = queue_tx.available() {
ticket_headers
} else {
continue;
};

if let Ok(Some(RatelimitHeaders::Global(headers))) = ticket_headers.await {
tracing::debug!(seconds = headers.retry_after(), "globally ratelimited");

bucket.is_locked.store(true, Ordering::Release);
tokio::time::sleep(Duration::from_secs(headers.retry_after())).await;
bucket.is_locked.store(false, Ordering::Release);
}
}
}
Copy link
Member

@zeylahellyer zeylahellyer Feb 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's take a look at the flow of the logic here:

  1. global queue task waits for an item in the queue, which happens on every ratelimiter request
  2. the global ratelimiter waits if needed
  3. the global queue notifies the other half they can send the request
    a. if the other half is still around, the global queue receives a receiving half to receive headers from
    b. otherwise, the global queue can know the other half doesn't exist and loops over to the next request
  4. the global queue waits for the headers from the other half; if the headers are of a global ratelimit then it sleeps, causing the entire system to stall, as intended
  5. it loops over

In terms of preventing more than a configured number of requests in a given period, this system of preemptive sleeping works great. The problem here is that this is synchronous: only one request will be processed at a time. Practically speaking, this means the global bucket won't limit to, say, 50 requests/1 second, but rather to a much smaller number of requests because the global bucket is a bottleneck between each request. Requests to ping Discord take an average latency of 102ms right now according to the status page, so functionally the global bucket will bottleneck users to < 10 requests/second, if not much fewer depending on the routes in use.

We need to change the implementation here to solve this. First instinct is that hitting a global ratelimit header with this system is unlikely with the preemptive ratelimiter in place, so what we could do instead is spawn a task for each set of headers while waiting. With a mutex in place instead of the is_locked boolean, we could lock up the ratelimiter for the duration. That works.

But this has its own flaw: if a user makes, say, 100 requests to the ratelimiter with a global bucket of 50 requests/1 second, then across just over 1 second they'll receive all 100 request allotments. If requests don't all take the same time (very likely), or they get stalled, or something else, then users could fire off more than 50 requests in a given second and still hit the global ratelimit. It's also impossible to determine when Discord will count against a request against a given second. We can see how this could fail under this circumstance:

  1. User requests 100 ratelimiter tickets for requests, current configuration is 50 requests/second
  2. 50 tickets are granted in the first second
  3. 40 are processed by Discord in that second
  4. 50 tickets are granted in the second second
  5. those 50, plus the 10 from the previous second, are processed by Discord
  6. that was 60 in a second -- oops, global ratelimit hit

The solution I see is to simultaneously limit the number of requests allotted in a given second and limit the number of in-flight requests to the configured number of requests. We can see why this prevents the previous scenario:

  1. User requests 100 ratelimiter tickets for requests, current configuration is 50 requests/second
  2. First second
    1. 50 tickets are granted in the first second and their requests are sent to Discord
    2. 40 are processed by Discord in that second
    3. 30 requests get responses and are processed by the global queue
  3. Second.. second
    1. Because 20 requests are still in flight and their global ratelimit status is indeterminate, 30 tickets are granted
    2. the responses of the 10 requests that were processed by Discord but didn't receive responses are received
    3. the 10 requests that weren't processed by Discord last second are processed and return responses
    4. 20 of the requests from this second are received
    5. 10 are not, so there are 10 requests still in flight
  4. Third second
    1. because 10 requests are in flight, there are 40 tickets allotted for the second
    2. the cycle continues
  5. it's mathematically and logically impossible for a global ratelimit

To accomplish this, my suggestion is we use a combination of tracking the time (like now), spawning tasks to process responses, and a Tokio semaphore with the configured number of requests as allotments. When one second passes over into another, the first processed request of that second can add allotments for the number of responses that were received last second. If you want tips on how to implement this let me know and I can provide them, it's a bit complicated.

Let me know if I'm thinking about this wrong, but this implementation of using a dual timer and semaphore seems like the very correct way of doing this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current time tracker is inefficient, wait_if_needed will ~always sleep and yield one request/(period/requests) and therefore evenly space them out. We could either use a leaky bucket algorithm (we can take a dependency upon a crate implementing it) or mirror the implementation of twilight_gateway::CommandRatelmitier. The ratelimiter in the gateway was quite tricky, so I'd recommend adding lots of tests for this module.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure that Discord resets every second opposed to evenly allowing more? In that case this way would allow requests to complete earlier.

Copy link
Member

@zeylahellyer zeylahellyer Feb 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure that Discord resets every second opposed to evenly allowing more? In that case this way would allow requests to complete earlier.

I misremembered, thanks for the catch. I don't believe the docs explicitly state this, but it is a GCRA implementation, so it's actually a leaky bucket that will replenish over time.

ticket_tx
} else {
continue;
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let-else once we upgrade to rust 1.65 🙏

let mut rx = self.rx.lock().await;

timeout(timeout_duration, rx.recv()).await.ok().flatten()
rx.recv().await
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am somewhat certain that this will never return None (as atleast one sender is tied to each BucketQueue), which is otherwise necessary for the bucket to be finished and freed (see BucketQueueTask::run).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c-http-ratelimiting Affects the http ratelimiting crate t-feature Addition of a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pre-emptively halt requests to not encounter global ratelimit
3 participants