This is an out of the box solution for ingesting periodic health pings from services that cannot expose a public HTTP endpoint - such as serverless webhooks, cron jobs, etc. This ping
service receives pings with a service name, a frequency and optionally a timestamp. When this service receives a ping, it's expecting to receive another ping within the timeframe inferred from the frequency param - if it doesn't hear within that frequency, an alert will be sent to a downstream alerting service (which can be configured).
I use asdf
as my local package manager - to get up and running you'll need to run the following from the root of the directory:
# on mac
brew install asdf
asdf install
Additionally, you'll need to set the following environment variable PING_ALERT_URI
- this is the Alerts Service (AS) callback URI:
echo 'export PING_ALERT_URI="some.alert.service"' >> ~/.zshrc # or ~/.bashrc if you use bash
After that's set up run the following from the root of the project directory:
mix deps.get && mix deps.compile
Now you'll be able to start up the server locally or run the tests using either of the following commands:
iex -S mix
mix test
The /ping
endpoint accepts 3 parameters, 2 are necessary (name
, frequency
) and 1 is optional (timestamp
), in the following example both are valid requests:
curl "127.0.0.1:4000/ping?name=test_service&frequency=10m"
curl "127.0.0.1:4000/ping?name=test_service&frequency=10m×tamp=1664743905"
If a timestamp
parameter is not given, a timestamp will be generated on insertion into the backend server.
The following are accepted as parameters:
- The
name
parameter is any valid string. - The
frequency
parameter lets the service know how often it should be expecting a health ping. Valid formats are#{integer}#{time_unit}
, with the following as valid time units - [W
,D
,h
,m
,s
], representing [Weekly, Daily, Hourly, Minutely, Secondly] frequencies respectively. You can change frequencies on the fly - i.e. if I receive a ping at 10:00:00AM with a frequency of 1m, and then at 10:00:30AM I receive another ping from the same service with a frequency of 10m the service will expect the next ping to come at 10:10:30AM instead of 10:01:00AM.
For example, if we have a job called daily_digest
. This job will call the ping service every time it is run as follows:
GET /ping?name=daily_digest&frequency=1D
Once the ping service receives this request, it should expect the subsequent request within the next day, failing to hear back within that timeframe an alert endpoint will be called. For local development you can set up your own webhook on webhook.site as the alert endpoint.
In the case that an upstream service becomes deprecated, there is an endpoint for removing that service from the internal monitoring system.
This endpoint accepts 1 parameter:
name
— the name of the service
For example, if we have a job called daily_digest
and we deprecate this job (maybe in favor of a weekly_digest
job) we no longer have a need to monitor this job because we're no longer expecting pings. To remove this job we could call:
curl -X "DELETE" "127.0.0.1:4000/ping/daily_digest"
The HTTP server for this service was rewritten so that Phoenix and it's dependencies were dropped and replaced with a plain old plug cowboy http server. The original Phoenix project was spun up using the following flags: --no-assets --no-ecto --no-html --no-gettext --no-live --no-dashboard --no-mailer
so that this service would have as lean of an API as possible, but using Phoenix still came with a very large performance overhead cost.
By ripping out Phoenix and its dependencies, I saw about 90% faster API response times.
The original Phoenix version is in the branch phoenix-version
if you'd like to test it out yourself.
Endpoint | HTTP Server | Avg Response Time |
---|---|---|
GET /ping | Plug Cowboy | 104μs |
Phoenix | 1005μs | |
DELETE /ping | Plug Cowboy | 63μs |
Phoenix | 702μs |
- Add
/ping/remove
endpoint which accepts aname
parameter. - Performance tuning - speed up API by ditching Phoenix.
- Replace
:tesla
with:req
- Add guard for last_ping_timestamp - don't allow incoming pings to overwrite previous pings
- Guard against negative intervals
- Move state off of this service - maybe try out SQLite -> Fly's LiteFS. Could potentially have an adapter for stateful / stateless (current)
- Add some automated CI/CD for testing/formatting/linting.