Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add a guide on Batching in Tailcall #97

Closed
wants to merge 8 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
189 changes: 189 additions & 0 deletions docs/guides/batching.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,189 @@
---
title: "Batching"
---

One of the ways developers can handle the dreaded [N+1 problem](https://tailcall.run/docs/guides/n+1/) is to use batching.

Batching when configured correctly, can significantly reduce strain on high-traffic backends. You only need to add a handful of operators to your GraphQL schema (i.e. custom directives) for Tailcall to do most of the heavy lifting for you[^1].

## Scenario
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need some introduction before describing the scenario because it feels like steep turn for this here and hard to catch what is this about

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your remarks. What would you suggest here?

The original prompt by Tushar was:

Write a guide about the batching capabilities of Tailcall. Consider a food delivery application where a lot of users are requesting the same data. Batching could be implemented based on GEO Location / City / Locality etc. Emphasize on the impact it has on performance and how it can help scale. Consider the backend to be in REST.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@meskill I've addressed your remarks by adding a preceding sentence. Please refer to commit a660f84.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • REduce the drama about "perks" and "remote work" and "slack bot"
  • Talk about traffic surge during lunch hours. This is not just about orders, but also users in the same GEO location requesting for the same data again and again and the upstreams services serving literraly the same data to everyone. Instead having an orchestrator that could sit in between the frontend and the upstream services could very efficiently batch these request, reducing the load massively.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tusharmath if upstream is repeatedly serving the same data to difference users, then the obvious solution here is caching and not batching.

This is why I didn’t dwell on concurrents users but on a single user performing multiple requests.

To my knowledge, there’s no way for Tailcall to batch requests from multiple users (read multiple simultaneous requests) into a single upstream request.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if upstream is repeatedly serving the same data to difference users, then the obvious solution here is caching and not batching.

Not necessary, caching is relevant when the data is changing. With batching enabled, we can serve multiple people the same data without requesting it multiple times upstream and can do it even if the data isn't cachable.

To my knowledge, there’s no way for Tailcall to batch requests from multiple users (read multiple simultaneous requests) into a single upstream request.

Check @server.batchRequest it allows tailcall to batch multiple requests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tusharmath
I feel like we are talking past each other here. Tailcall has two notions of batching but based on your feedback, it is not really clear which one you are asking for.

Inbound Batching

My understanding of batching in this context is similar to the concept of multiplexing in HTTP/2 as illustrated by the diagram in this SO answer:

As an example, an end user sends 5 different graphql requests in quick succession to the Tailcall server (this might happen on app load when trying to hydrate multiple view components). Because the 5 inbound requests were received within a few millis of each other, the Tailcall server can decide to queue all 5 responses so it can send only 1 response to the end user, if the Tailcall server was started with batchRequests option set to true.
This type of batching reduces the network round-trip from 5 responses down to only 1 response.

Outbound Batching

The Tailcall server needs to make a multiple outbound requests to another server using a combination of the @upstream operator and the @http operation. Instead of making 50 or 100 requests, one-by-one, Tailcall can batch them into a single outbound request. This type of batching is what I understood that the original article was asking for as can be seen in the diagram, examples and description.

The only way to use geo-location to improve response times for millions of end users trying to use a service is to either use geo-routing or introduce a CDN that is geo-aware to cache identical requests. It was hard to work either of those solutions into the article which is why I focused on the Slack command example.

In fact your last sentence in the original issue was: "Consider the backend to be in REST" and I asked clarifying questions under the assumption this would be an article about outbound batching on the original issue but never got any response.

Essentially, this would mean a complete rewrite of an article that took 2 months just to get feedback ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tusharmath no response ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @tusharmath
your response is needed to this question as it affects all of the feedback provided.


Catered lunches and healthy snacks are some of the free perks startups offer to office employees, but with the rise of remote work, many startups are leaving this tradition.

One way to keep the tradition going would be a universal meal delivery application in Slack, available as a `/ship-meal` command. Managers, anywhere in the world, could use the command to have lunch delivered to teammates, including those that work from home.

### Multiple Vendors

![image](../../static/images/meal-delivery-app.drawio.png)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use excalidraw to create images.


### Constraints

The nature of such a service could cause traffic spikes for every upstream vendor needed to fulfil each individual order, so some care has to be taken when designing the meal app's API.
For instance, if thousands of managers across the world use the command at the same time (just before lunch break) to place team orders, the sudden traffic spike will spread to upstream vendors, leading to delayed or failed orders.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For instance, if thousands of managers across the world use the command at the same time (just before lunch break) to place team orders, the sudden traffic spike will spread to upstream vendors, leading to delayed or failed orders.
For instance, if millions of people across the world use the command at the same time (just before lunch break) to place team orders, the sudden traffic spike will spread to upstream vendors, leading to delayed or failed orders.


_Batching_ is a one technique that can be used to avoid overwhelming upstream servers with too many simultaneous requests. Batching combines multiple operations in a bulk operation that is sent in a single request.

Tailcall supports batching via two [operators](https://tailcall.run/docs/operators/): `@upstream` and `@http`.
Before we go over this Tailcall feature, we'll briefly review the most common implementations of batching in REST APIs.

---

## Batching in REST APIs

### High Correspondence Between REST and CRUD

In backends that adhere to the REST architectural style, the HTTP methods `POST`, `GET`, `PUT`, and `DELETE` roughly correspond to _Create_, _Read_, _Update_, and _Delete_ operations respectively in the CRUD paradigm as can be seen in the table below.

| HTTP method | CRUD paradigm |
| ----------- | ------------- |
| `POST` | _Create_ |
| `GET` | _Read_ |
| `PUT` | _Update_ |
| `DELETE` | _Delete_ |

This one-to-one correspondence works because CRUD and REST often deal with a single entity or resource as the case may be.

```bash
POST /v1/employees (Create an employee entity)
GET /v1/employees/:id (Read an employee entity)
PUT /v1/employees/:id (Update an employee entity)
DELETE /v1/employees/:id (Delete an employee entity)
GET /v1/employees (Read multiple employee entities)
```

### Low Correspondence Between Batching and CRUD

The one-to-one correspondence doesn't carry over when batching is added to a REST API because batching can either involve:

- performing the same operation on different entities of the same type (e.g. _Update_ the team's order so meals are shipped to employees with the following ids `1`, `4` & `7`) or;
- grouping together different operations in one request (e.g. _Create_ `Jain` as a new employee, _Update_ an employee's meal preferences and _Delete_ meals above a certain price from the menu).

### Real-world Examples of Batching
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Table isn't readable because we have too many columns. Flatten it into multiple paras.


The table below condenses the most common URL styles used to implement batching in the real-world.

| Operation | HTTP method | URL style | Parameters | Content type | Example |
| --------- | ----------- | ----------------------------- | ---------------- | ------------------ | ------------------------------------------------------------------------------------------------ |
| _Read_ | | | | | |
| | `GET` | 1. `/users?id=1&id=4&id=7` | URL query params | - | [github.com](https://github.com/search?q=user%3Adefunkt+user%3Aubuntu+user%3Amojombo&type=users) |
| | `GET` | 2. `/users/1,4,7` | URL path params | - | [ipstack.com](https://ipstack.com/documentation#bulk) |
| | `POST` | 3. `/users/` | Request body | `application/json` | [ipinfo.io](https://ipinfo.io/developers/advanced-usage#batching-requests) |
| _CRUD_ | | | | | |
| | `POST` | 4. `/users?batch` | Request body | `application/json` | [facebook.com](https://developers.facebook.com/docs/graph-api/batch-requests) |
| | `POST` | 5. `/batch/submitJob` (async) | Request body | `application/json` | [arcgis.com](https://developers.arcgis.com/rest/services-reference/enterprise/batch-geocode.htm) |
| | `POST` | 6. `/batch/` | Request body | `multipart/mixed` | [google.com](https://cloud.google.com/storage/docs/batch) |

### Batching: Sync or Async
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some context about error handling should also be added to this guide.


REST APIs that support batching can either follow a synchronous or asynchronous style depending on the underlying operation.

The sync style is the most common and is often used for operations that are short-lived i.e. operations that can be completed quickly so that the server can return an immediate response (`200 OK`). In fact, out of the 6 different URL styles shown in the table above, only URL style #5 is asynchronous, the rest are synchronous.

The async style is used in situations where an operation can take a considerable amount of time to complete. The server will process the request asynchronously but will return an immediate response (`202 Accepted`)[^202-Accepted] instead of letting the client wait.

### Receiving an Async Response: Pull or Push

Within the async request style, there are two ways of retrieving a response from the server: pull or push.

In the pull model: the client periodically polls the server to check that the operation has completed successfully, or failed.

In the push model: when the operation is complete, the server pushes the results over an existing subscription with the client such as a web socket (for browser-based clients) or a web hook (for server-based clients).[^delivery]

### Open Data Protocol

URL style #6 in the table above, uses a `Content-type` of `multipart/mixed` which makes it the most flexible way of implementing batching in REST APIs. It allows clients to submit arbitrary operations (multiple _Create_, _Read_, _Update_, and _Delete_ operations, each with its own `Content-type`) in a single request, though most services enforce a limit[^batch-size-limit] in the range of 10-1000 called the batch size. The batch size is the number of sub-requests that can be included in a single request to an endpoint that supports batching.

---

## Batching in Tailcall
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the head of the document is generally talking about REST & Batching. The jump to Tailcall is quite sudden. I would split this doc into two separate ones. The tailcall specific bits should be in the guide, the remaining stuff can be on our blogs.


Tailcall supports batching `GET` requests in REST APIs that follow the design in URL style #1 in the table above. The batch size is configurable and can be set via `@upstream(... batch.maxSize)`.

Let's now return to our meal delivery app to illustrate how it works.

### Meal Prep and Delivery

Before meals are prepped, the meal delivery app will first check how many meals it will need to make for each company and the location of the employees where each meal will be delivered. Since employees may sometimes switch between working at the office, at a co-working space or from home, the app tries to estimate each employee's current location by geolocation of their IP address.

```graphql showLineNumbers
schema
@server(port: 8000, graphiql: true)
# highlight-start
@upstream(baseURL: "https://geoip-batch.fly.dev", httpCache: true, batch: {delay: 1, maxSize: 100}) {
# highlight-end
query: Query
}

type Query {
users: [User]! @http(path: "/users")
}

type User {
id: Int!
username: String!
email: String!
phone: String
ip: String!
# highlight-start
country: Country! @http(path: "/batch", query: [{key: "query", value: "{{value.ip}}"}], groupBy: ["query"])
# highlight-end
}

type Country {
query: String
country: String
regionName: String
city: String
lat: Float
lon: Float
}
```

A lot of geolocation services support batch requests to save on network round-trips. The sample `graphql` shows how to lookup the location of multiple employees using only one batch request.

The Tailcall [`@upstream`](https://tailcall.run/docs/operators/upstream/) operator exposes several properties that allows developers to control various aspects of the upstream server connection, including how requests are batched.

In our example above, I enabled HTTP caching by setting the [`httpCache`](https://tailcall.run/docs/operators/upstream/#httpcache) property to `true` since it defaults to `false`.

I also configured [`batch`](https://tailcall.run/docs/operators/upstream/#batch) object which controls batching. I set `delay: 1` indicating a delay of 1 millisecond between each batch request (to avoid getting throttled by the upstream server) and set `maxSize: 100` indicating the Tailcall can issue up to `100` sub-requests as part of a single batch request.

When you run the following GraphQL query:

```graphql
{
users {
id
username
email
phone
ip
country {
query
country
regionName
city
lat
lon
}
}
}
```

It will produce the following output in Tailcall:

```bash
2024-01-22T13:57:33Z INFO tailcall::cli::tc] N + 1: 0
[2024-01-22T13:57:33Z INFO tailcall::http] 🚀 Tailcall launched at [127.0.0.1:8000] over HTTP/1.1
[2024-01-22T13:57:33Z INFO tailcall::http] 🌍 Playground: http://127.0.0.1:8000
[2024-01-22T13:58:04Z INFO tailcall::http::client] GET https://geoip-batch.fly.dev/users HTTP/1.1
[2024-01-22T13:58:05Z INFO tailcall::http::client] GET https://geoip-batch.fly.dev/batch?query=100.159.51.104&query=103.72.86.183&query=116.92.198.102&query=117.29.86.254&query=137.235.164.173&query=141.14.53.176&query=163.245.232.27&query=174.238.43.126&query=197.37.13.163&query=205.226.160.3&query=25.207.107.146&query=29.82.54.30&query=43.20.78.113&query=48.30.193.203&query=49.201.206.36&query=51.102.180.216&query=53.240.20.181&query=59.43.194.22&query=71.57.235.192&query=73.15.179.178&query=74.80.53.208&query=75.75.234.243&query=78.170.185.120&query=78.43.74.226&query=82.170.69.15&query=87.213.156.73&query=90.202.216.39&query=91.200.56.127&query=93.246.47.59&query=97.11.116.84 HTTP/1.1
```

The `/users` endpoint returns a total of 30 users. As you can see in the output, Tailcall constructed a batch request that concatenates the IP addresses of all 30 users in one request, rather than make 30 individual requests to the geolocation service.

Batching is an optimization technique for mitigating the [N+1 problem](https://tailcall.run/docs/guides/n+1/) as it can significantly reduce the number of network round-trips needed to fulfil a request when one or more upstream servers are involved.

[^1]: To take full advantage of batching, the REST backends being proxied with Tailcall must themselves have support for batching i.e. they must support the ability to combine multiple individual requests into a single request.
[^delivery]: https://news.ycombinator.com/item?id=28392042
[^202-Accepted]: https://www.mscharhag.com/api-design/bulk-and-batch-operations
[^batch-size-limit]: https://www.codementor.io/blog/batch-endpoints-6olbjay1hd#other-considerations-for-batch-processing
Binary file added static/images/meal-delivery-app.drawio.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading