Skip to content

Commit 9d1fc34

Browse files
committed
detailed explanation of extending routelib
vs focusing on the raw API.
1 parent 6c93e45 commit 9d1fc34

File tree

1 file changed

+250
-61
lines changed

1 file changed

+250
-61
lines changed

content/features/proxy/api-reference.md

+250-61
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,15 @@ weight = 200
66

77
This page provides a description of and reference to the Memcached built-in proxy API. These allow customization or replacement of the proxy's standard route library in advanced use cases.
88

9+
Use this reference if you wish to make changes to the standard route library,
10+
extend it with your own custom route handlers, or write your own fully custom
11+
library from scratch.
12+
913
For a general overview of the built-in proxy, see [Built-in proxy]({{<proxy_base_path>}}).
1014

1115
## Development status of the proxy API
1216

13-
At this stage API functions are mostly stable, but are still subject to
17+
API functions are mostly stable, but are still subject to
1418
occasional change. Most changes in functionality will be either additions or
1519
with a backwards-compatible deprecation cycle.
1620

@@ -51,102 +55,289 @@ flowchart TD
5155
configure --> |for each worker: copy lua code and config table|worker
5256
```
5357

54-
The proxy flow starts by parsing a request (ie: `get foo`) and looking for a function hook for this command. If a hook exists, it will call the supplied function. If no hook exists, it will handle the request as though it were a normal memcached.
58+
## Configuration stages
59+
60+
As noted above, configuring the proxy happens in
61+
two distinct stages. These stages use fully independent Lua VM instances: a
62+
configuration instance, and one for each worker thread. This separation is
63+
designed to improve the safety of reloading the config and for performance as
64+
a separate "configuration thread" is used for more intense work.
65+
66+
If a configuration fails to execute on reload, in most cases, the proxy will
67+
continue to run the old configuration.
68+
69+
The same Lua code file is used for all threads: it is loaded and compiled
70+
once.
5571

56-
In Lua, this looks like: `mcp.attach(mcp.CMD_GET, function)` - Functions are objects and can be passed as arguments. The function is called within a coroutine, which allows us to designs routes procedurally even if they have to make a network call in the middle of executing.
72+
The first stage is to call `mcp_config_pools` from the "configuration thread".
73+
This stage defines backends and organizes them into pools. It is also a good
74+
place to decide on how keys and commands should be routed. A table tree of
75+
information is then returned from this function. The proxy is not blocked or
76+
otherise impacted while this function runs.
77+
78+
The second stage is to call `mcp_config_routes` from each "worker thread".
79+
These VM's run independently for performance reasons: there is no locking
80+
involved and any garbage collection happens per-thread instead of for the
81+
whole process.
82+
83+
In this second stage we take information generated in the first stage to
84+
create function generator objects, which produce coroutines that actually
85+
process user requests. We also generate router objects that decide which
86+
functions should handle a particular command or key prefix. Finally, we attach
87+
either a function generator or a router object to the proxy's command hooks.
88+
89+
Since this second stage is executed in the same worker threads that handle
90+
requests, it is a good idea to keep the code simple. Errors here can also
91+
stop the process and are not easily recoverable.
5792

58-
The function is called with a prototype of:
5993
```lua
60-
function(request)
94+
function mcp_config_pools()
95+
-- create a table of pools, and route information
96+
local t = {}
97+
-- fill t with pools/information
98+
return t
99+
end
61100

101+
function mcp_config_routes(t)
102+
-- Create function generators and routers.
103+
local router = -- see below
104+
-- Here we override any command. It is also possible to override only
105+
-- specific commands: ie: just override 'set', but handle 'get' as though
106+
-- we are not a proxy but a normal storage server.
107+
mcp.attach(mcp.CMD_ANY_STORAGE, router)
62108
end
63109
```
64110

65-
The most basic example of a valid route would be:
66-
`mcp.attach(mcp.CMD_GET, function(r) return "SERVER_ERROR no route\r\n" end)`
111+
Our [Route
112+
library](https://github.com/memcached/memcached-proxylibs/tree/main/lib/routelib)
113+
can handle this for you and is a good abstraction to start from. For most
114+
people the rest of this API reference can be used to create your own route
115+
handlers. Since routelib handles the stages of execution, you can focus on
116+
processing requests and nothing else about how the configuration is built.
67117

68-
For any get command, we will return the above string to the client. This isn't very useful as-is. We want to test the key and send the command to a specific backend pool; but the function only takes a request object. How are routes actually built?
118+
---
69119

70-
The way we recommend configuring routes are with _function closures_. In lua functions can be created capturing the environment they were created in. For example:
120+
## Extending routelib with new handlers
71121

72-
```lua
73-
function new_route()
74-
local res = "SERVER_ERROR no route\r\n"
75-
return function(r)
76-
return res
77-
end
78-
end
122+
Examples for extending routelib can be found
123+
[here](https://github.com/memcached/memcached-proxylibs/blob/main/lib/routelib/examples/)
124+
- look for files that start with `custom`. The files are commented. More
125+
detailed information is repeated here.
126+
127+
### Loading custom handlers
79128

80-
mcp.attach(mcp.CMD_GET, new_route())
129+
You must instruct memcached to load your custom route handler files. If you
130+
are already using routelib your start line may look like:
131+
132+
```
133+
-o proxy_config=routelib.lua,proxy_arg=config.lua
81134
```
82135

83-
In this example, `new_route()` _returns a function_. This function has access to the environment (`local res = `) of its parent. When proxy calls the `CMD_GET` hook, it's calling the function that was returned by `new_route()`, not `new_route()` itself. This function uselessly returns a string.
136+
If your custom route handler is defined in `customhandler.lua`, you need to
137+
add it as an argument to `proxy_config`:
84138

85-
This should give you enough context to understand how the libraries in [the proxylibs repository](https://github.com/memcached/memcached-proxylibs) are implemented.
139+
```
140+
-o proxy_config=routelib.lua:customhandler.lua,proxy_arg=config.lua
141+
```
142+
143+
This lets the proxy manage loading and reloading the configuration for you,
144+
without having to focus too much on Lua.
86145

87-
Since we have a real programming language for both configuration and the routes themselves, we can write loops around patterns and keep the configuration short.
146+
### Custom handler configuration stages
88147

89-
## Function generators and request contexts
148+
As described above in [configuration stages](#configuration-stages) there is a
149+
two-stage process to configuration loading. This needs to be taken into
150+
account when writing custom handlers for routelib as well.
90151

91-
NOTE: This information is only useful for people intending to develop a route
92-
library or extend `routelib`. End users should read the
93-
[routelib](https://github.com/memcached/memcached-proxylibs/blob/main/lib/routelib/README.md) README.
152+
For routelib we call a `_conf` function during the first stage, and `_start`
153+
during the second per-worker stage.
94154

95-
To achieve high performance while still allowing dynamically scriptable route
96-
handling, we must A) pre-create data, strings, etc and B) avoid allocations,
97-
which avoids Lua garbage collection. We also need to carefully manage the
98-
lifecycle of pools and backends used by the proxy, and maintain access to
99-
useful context for the duration of a request.
155+
Also, we must register our handler with routelib via a call to `register_route_handlers` when the code is loading.
156+
157+
For example:
158+
```lua
159+
-- This function is executed during the first stage, in the configuration VM.
160+
-- This is where we can check and transform arguments passed in from the
161+
-- routelib config
162+
function route_myhello_conf(t, ctx)
163+
-- User supplied arguments are in the `t` table.
164+
-- A `ctx` context object is also supplied, which can be queried for
165+
-- information about the handler, help with statistical counters, etc.
166+
t.msg = string.format("%s: %s", t.msg, ctx:label())
167+
-- Get an internal ID to later use as a global counter for how many times
168+
-- our hello function is called.
169+
if t.stats then
170+
ctx:stats_get_id("myhello")
171+
end
172+
173+
-- This return table must be a pure lua object. It is copied _between_
174+
-- the configuration VM and each worker VM's, so special objects,
175+
-- metatables, and so on cannot be transferred.
176+
return t
177+
end
178+
```
100179

101-
This requires wrapping request handling functions _with context_
180+
Next, we define the `_start` routine, which creates a function generator. This
181+
generator function is used to create coroutines used to process requests.
102182

103183
When a request is processed by the proxy, it needs to first acquire a _request
104-
context slot_. This provides a function that will execute a request. After a
105-
request is complete the slot may be _reused_ for the next request. This also
184+
context slot_. This provides a lua function that will process a request. After a
185+
request is complete the slot may be _reused_ for another request. This
106186
means we need _as many slots as there are parallel requests_. If a worker is
107187
processing three `get` requests _in parallel_, it will need to create three
108188
contexts in which to execute them.
109189

110-
We use a function generator to create this data. This also allows us to
111-
pre-create strings, validate arguments, and so on in order to speed up
112-
requests.
190+
Pre-creating and caching request slots gives us good performance while still
191+
using a scripting language to process requests.
192+
193+
```lua
194+
-- Here we prepare the generator that will later generate each request slot,
195+
-- and slots will ultimately process requests.
196+
-- This indirection allows us to share pre-processed information with every
197+
-- slot, further improving performance. It also lets all slots share internal
198+
-- references to pools and backends.
199+
200+
-- This function is excuted during the second stage, on every worker thread.
201+
-- The `t` argument is the table that was returned by the _conf function.
202+
-- All information needed to generate this handler _must_ be contained in this
203+
-- table. It is bad form to reach for global variables at this stage.
204+
function route_myhello_start(t, ctx)
205+
local fgen = mcp.funcgen_new()
206+
-- not adding any children for this function.
207+
208+
-- We pass in the argument table to the function generator directly here,
209+
-- the table will then be passed along again when create a slot.
210+
-- The `n` argument creates a name for this function, visible by running
211+
-- `stats proxyfuncs`
212+
-- `f` is the function to call when we need a new slot made.
213+
fgen:ready({ a = t, n = ctx:label(), f = route_myhello_f })
214+
215+
-- make sure to return the generator we just made
216+
return fgen
217+
end
113218

114-
A minimal example:
219+
-- This is now the function that will be called every time we need a new slot.
220+
function route_myhello_f(rctx, a)
221+
-- Do some final formatting of the message string.
222+
-- Note there is no reason we can't do this in the `_start` function,
223+
-- but we are doing this here to make a simple example.
224+
local msg = string.format("SERVER_ERROR %s\r\n", a.msg)
115225

226+
local stats_id = a.stats_id
227+
local s = mcp.stat -- small lua optimization, cache the stat func
228+
229+
-- This is the function called to process an actual request.
230+
-- It will be cached and called again later on the next request.
231+
-- Thus the above code in this func is not repeatedly executed.
232+
return function(r)
233+
if stats_id then
234+
s(stats_id, 1)
235+
end
236+
return msg
237+
end
238+
end
239+
240+
-- finally, we register this handler so routelib will allow us to use it in a
241+
-- config file.
242+
register_route_handlers({
243+
"myhello",
244+
})
245+
```
246+
247+
Another example for formatting the `_start` function, using more advanced
248+
lua syntax:
116249
```lua
117-
function mcp_config_routes(pool)
118-
-- get a new bare object.
250+
function route_myhello_start(t, ctx)
119251
local fgen = mcp.funcgen_new()
120-
-- reference this pool object so it cannot deallocate and be lost.
121-
-- note that we can also reference _other function generators_,
122-
-- providing us with a graph/tree/recursive style configuration.
123-
local handle = fgen:new_handle(pool)
252+
local msg = string.format("SERVER_ERROR %s\r\n", a.msg)
253+
254+
fgen:ready({ n = ctx:label(), f = function(rctx, a)
255+
-- Previously this would be `route_myhello_f`
256+
-- The difference is we've generated msg just once for every slot that
257+
-- will later be made, and aren't using the passed along arguments
258+
-- table
259+
local stats_id = a.stats_id
260+
local s = mcp.stat
261+
262+
return function(r)
263+
if stats_id then
264+
s(stats_id, 1)
265+
end
266+
return msg
267+
end
268+
end})
269+
270+
return fgen
271+
end
272+
```
273+
274+
### Custom handlers with children
124275

125-
-- finalize the function generator object. When we need to create a new
126-
-- slot, we will call `route_handler`, which will return a function
127-
fgen:ready({ f = route_handler, a = handle })
276+
Handlers that route to pools (or other handlers) must define `children`.
277+
Routelib will magically translate the _name_ of a child into a _pool object_
278+
inbetween the two configuration stages. This simplifies a lot of object
279+
management we would have to do otherwise.
128280

129-
-- attach this function generator to `get` commands
130-
mcp.attach(mcp.CMD_GET, fgen)
281+
The simplest example is the `route_direct` handler. Given a routelib
282+
configuration that looks like:
283+
```lua
284+
pools{
285+
main = { backends = { "127.0.0.1:11211" } }
286+
}
287+
288+
routes{
289+
map = {
290+
main = route_direct{
291+
child = "main"
292+
}
293+
},
294+
}
295+
```
296+
297+
Above you can see the `route_direct` handler is given a singular `child`
298+
argument.
299+
300+
```lua
301+
function route_direct_conf(t)
302+
-- This table contains: t.child = "main", but we don't need to do any
303+
-- processing and can simply pass this along.
304+
-- Routelib looks for any keys that being with `child` and attempts to
305+
-- translate them from words to internal pool objects for us.
306+
return t
131307
end
132308

133-
-- our job is to pre-configure a reusable function that will process requests
134-
function route_handler(rctx, a)
135-
-- anything created here is available in the function below as part of its
136-
-- local environment
137-
local handle = a
309+
local function route_direct_f(rctx, handle)
310+
-- Nothing else to do when generating this slot.
138311
return function(r)
139-
-- the rctx object is unique to _each slot generated_
140-
-- it gives us access to backend API calls, info about the client
141-
-- connection, and so on.
312+
-- All we do when processing the actual request is send it down to the
313+
-- handle we made earlier.
142314
return rctx:enqueue_and_wait(r, handle)
143315
end
144316
end
317+
318+
function route_direct_start(t, ctx)
319+
local fgen = mcp.funcgen_new()
320+
-- t.child has been translated into something we can use here.
321+
-- At this point we reference it into a handle we can later route to.
322+
-- Handles must be generated once and are reused for every slot generated.
323+
local handle = fgen:new_handle(t.child)
324+
fgen:ready({ a = handle, n = ctx:label(), f = route_direct_f })
325+
return fgen
326+
end
145327
```
146328

329+
Aside from `child`, the most common argument is `children`, which will be a
330+
table. This table will contain an array of pools or a table of `name = pool`,
331+
commonly used for zoned route handlers (see `route_zfailover` for an example).
332+
333+
All of this indirection (child -> pool object -> handle) is how the proxy
334+
manages the lifecycle of backend connections, which is important when
335+
live-reloading the configuration. Existing in-flight requests are not affected
336+
by the changing configuration and we free up resources as soon as possible.
337+
147338
---
148339

149-
## Request context API and backend requests
340+
## Request context API and executing requests
150341

151342
Function generator API
152343
```lua
@@ -191,7 +382,7 @@ generating the function to ultimately serve requests.
191382
-- execute this function callback when this handle has been executed.
192383
-- this allows adding extra handling (stats counters, logging) or overriding
193384
-- wait conditions
194-
rctx:handle_set_cb(handle, function)
385+
rctx:handle_set_cb(handle, func)
195386

196387
-- callbacks look like:
197388
local cb = function(res, req)
@@ -276,7 +467,6 @@ flowchart TD
276467
C -->|request| D[backend]
277468
C -->|request| E[backend]
278469
```
279-
```
280470

281471
### Where is the results table?
282472

@@ -314,9 +504,8 @@ end
314504
Lets say we want to route requests to different pools of memcached instances
315505
based on part of the key in the request: for this we use router objects.
316506

317-
Routers can achieve this matching efficiently without having to examine the
318-
key inside Lua, which would result in many copies and regular
319-
expressions.
507+
If you are using routelib, you do not ever need to create routers directly.
508+
Routelib exposes most of this functionality under its `routes{}` section.
320509

321510
```lua
322511
-- Minimal example of a router. Here:

0 commit comments

Comments
 (0)