Skip to content

Commit 2dc0dba

Browse files
Insights event batching (#578)
* Basic config and client worker integration * EventsWorker * Fix issue where throttled batches are dropped * Docs, spec stuff & logging dropped events * Use queue for batches, drop batch for throttling This replaces the array with a queue so we can get FIFO behavior when pulling batches. This is to ensure that we attempt to send the oldest batch first. With this, I'm also using the same batch dropping logic for throttled batches. This way, we drop the oldest batch after 3 failures (roughly 3 minutes). When the user upgrades or the throttle is reset, the events we send right after will be closer to the current time. * Apply suggestions from code review Co-authored-by: Parker Selbert <[email protected]> * Move more init into the worker * Put the old config stuff back * Replace timer_ref with GenServer timeouts Based on @sorentwo's suggestion, I refactored the timer logic to lean on GenServer timeouts, and it made the code much cleaner. Now the GenServer callback return values contain the timeout, which we calculate for each response. If we are throttling, we default to the 60s throttle wait time, otherwise we use the configured timeout. A cast or call should reset the timeout with the 3rd position of the callback return, so each invocation essentially resets timeout. * Fix timeout issues I did not test the timeout functionality very well in the first pass. It was resetting the timeout on each successive event push, which is not what I wanted. This change keeps track of the start of the timeout and uses that to update the GenServer with the _remaining_ timeout. Now it behaves like a true batch limit OR timeout limit. * Set throttle_wait from mix config Without the default, throttle_wait was set to nil in the state. * Remove error responses in HTTPAdapter The client is expecting {:ok, %HTTPResponse} for even "failing" responses (400-599). Returning error responses here was keeping the EventWorker from reponding properly for 429 responses. --------- Co-authored-by: Parker Selbert <[email protected]>
1 parent 8e5f096 commit 2dc0dba

File tree

8 files changed

+750
-20
lines changed

8 files changed

+750
-20
lines changed

lib/honeybadger.ex

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -169,7 +169,7 @@ defmodule Honeybadger do
169169

170170
require Logger
171171

172-
alias Honeybadger.{Client, Notice}
172+
alias Honeybadger.{Client, Notice, EventsWorker}
173173
alias Honeybadger.Breadcrumbs.{Collector, Breadcrumb}
174174

175175
@type notify_options ::
@@ -206,7 +206,9 @@ defmodule Honeybadger do
206206
Honeybadger.Breadcrumbs.Telemetry.attach()
207207
end
208208

209-
Supervisor.start_link([{Client, [config]}], strategy: :one_for_one)
209+
children = [{Client, [config]}, EventsWorker]
210+
211+
Supervisor.start_link(children, strategy: :one_for_one)
210212
end
211213

212214
@doc """
@@ -345,9 +347,13 @@ defmodule Honeybadger do
345347
def event(event_data) do
346348
ts = DateTime.utc_now() |> DateTime.to_string()
347349

348-
event_data
349-
|> Map.put_new(:ts, ts)
350-
|> Client.send_event()
350+
data = Map.put_new(event_data, :ts, ts)
351+
352+
if get_env(:events_worker_enabled) do
353+
EventsWorker.push(data)
354+
else
355+
Client.send_event(data)
356+
end
351357
end
352358

353359
@doc """

lib/honeybadger/client.ex

Lines changed: 79 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -90,6 +90,23 @@ defmodule Honeybadger.Client do
9090
end
9191
end
9292

93+
@doc """
94+
Send events in batches
95+
"""
96+
@spec send_events(list) :: :ok | {:error, reason :: atom()}
97+
def send_events(events) when is_list(events) do
98+
if pid = Process.whereis(__MODULE__) do
99+
# 30 second timeout
100+
GenServer.call(pid, {:events, events}, 30_000)
101+
else
102+
Logger.warning(fn ->
103+
"[Honeybadger] Unable to post events, the :honeybadger client isn't running"
104+
end)
105+
106+
{:error, :client_not_running}
107+
end
108+
end
109+
93110
@doc """
94111
Check whether reporting is enabled for the current environment.
95112
@@ -148,7 +165,7 @@ defmodule Honeybadger.Client do
148165
|> Map.get(:hackney_opts)
149166
|> Keyword.merge(opts)
150167

151-
post_notice(url, headers, payload, hackney_opts)
168+
post_payload(url, headers, payload, hackney_opts)
152169

153170
{:error, %Jason.EncodeError{message: message}} ->
154171
Logger.warning(fn -> "[Honeybadger] Notice encoding failed: #{message}" end)
@@ -186,7 +203,7 @@ defmodule Honeybadger.Client do
186203
|> Keyword.merge(opts)
187204

188205
# post logic for events is the same as notices
189-
post_notice(event_url, headers, payload, hackney_opts)
206+
post_payload(event_url, headers, payload, hackney_opts)
190207

191208
{:error, %Jason.EncodeError{message: message}} ->
192209
Logger.warning(fn -> "[Honeybadger] Event encoding failed: #{message}" end)
@@ -205,13 +222,72 @@ defmodule Honeybadger.Client do
205222
{:noreply, state}
206223
end
207224

225+
# Events
226+
#
227+
@impl GenServer
228+
def handle_call({:events, _}, _from, %{enabled: false} = state) do
229+
{:reply, :ok, state}
230+
end
231+
232+
def handle_call({:events, _}, _from, %{api_key: nil} = state) do
233+
{:reply, {:error, :no_api_key}, state}
234+
end
235+
236+
def handle_call({:events, events}, _from, state) do
237+
%{event_url: event_url, headers: headers} = state
238+
# Convert each event to JSON and join with newlines
239+
encoded_events = Enum.map(events, &Honeybadger.JSON.encode/1)
240+
241+
# Check if any encoding failed
242+
if Enum.any?(encoded_events, &match?({:error, _}, &1)) do
243+
{:reply, {:error, :encoding_error}, state}
244+
else
245+
payload = Enum.map_join(encoded_events, "\n", &elem(&1, 1))
246+
247+
opts =
248+
state
249+
|> Map.take([:proxy, :proxy_auth])
250+
|> Map.put(:pool, __MODULE__)
251+
|> Keyword.new()
252+
253+
hackney_opts =
254+
state
255+
|> Map.get(:hackney_opts)
256+
|> Keyword.merge(opts)
257+
258+
headers =
259+
List.keyreplace(headers, "Content-Type", 0, {"Content-Type", "application/x-ndjson"})
260+
261+
response =
262+
case HTTPAdapter.request(:post, event_url, payload, headers, hackney_opts) do
263+
{:ok, %HTTPResponse{body: body, status: status}} when status in 200..399 ->
264+
Logger.debug(fn -> "[Honeybadger] API success: #{inspect(body)}" end)
265+
:ok
266+
267+
{:ok, %HTTPResponse{status: status}} when status == 429 ->
268+
Logger.warning("[Honeybadger] API rate limited:")
269+
{:error, :throttled}
270+
271+
{:ok, %HTTPResponse{status: status}} when status in 400..599 ->
272+
Logger.warning("[Honeybadger] API failure")
273+
{:error, :api_error}
274+
275+
{:error, reason} ->
276+
Logger.warning(fn -> "[Honeybadger] connection error: #{inspect(reason)}" end)
277+
{:error, :connection_error}
278+
end
279+
280+
{:reply, response, state}
281+
end
282+
end
283+
208284
# API Integration
209285

210286
defp build_headers(opts) do
211287
[{"X-API-Key", get_env(opts, :api_key)}] ++ @headers
212288
end
213289

214-
defp post_notice(url, headers, payload, hackney_opts) do
290+
defp post_payload(url, headers, payload, hackney_opts) do
215291
case HTTPAdapter.request(:post, url, payload, headers, hackney_opts) do
216292
{:ok, %HTTPResponse{body: body, status: status}} when status in 200..399 ->
217293
Logger.debug(fn -> "[Honeybadger] API success: #{inspect(body)}" end)

lib/honeybadger/events_worker.ex

Lines changed: 242 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,242 @@
1+
defmodule Honeybadger.EventsWorker do
2+
@moduledoc """
3+
A GenServer that batches and sends events with retry and throttling logic.
4+
5+
It accumulates events in a queue, forms batches when the batch size is reached or
6+
when a flush timeout expires, and then sends these batches to a backend module.
7+
If a batch fails to send, it will be retried (up to a configurable maximum) or dropped.
8+
In case of throttling (e.g. receiving a 429), the flush delay is increased.
9+
"""
10+
11+
@dropped_log_interval 60_000
12+
13+
use GenServer
14+
require Logger
15+
16+
defmodule State do
17+
@typedoc """
18+
Function that accepts a list of events to be processed.
19+
"""
20+
@type send_events_fn :: ([term()] -> :ok | {:error, :throttled} | {:error, term()})
21+
22+
@typedoc """
23+
State for the event batching GenServer.
24+
"""
25+
@type t :: %__MODULE__{
26+
# Configuration
27+
send_events_fn: send_events_fn(),
28+
batch_size: pos_integer(),
29+
max_queue_size: pos_integer(),
30+
timeout: pos_integer(),
31+
max_batch_retries: non_neg_integer(),
32+
throttle_wait: pos_integer(),
33+
34+
# Internal state
35+
timeout_started_at: non_neg_integer(),
36+
throttling: boolean(),
37+
dropped_events: non_neg_integer(),
38+
last_dropped_log: non_neg_integer(),
39+
queue: [any()],
40+
batches: :queue.queue()
41+
}
42+
43+
@enforce_keys [
44+
:send_events_fn,
45+
:batch_size,
46+
:max_queue_size,
47+
:max_batch_retries
48+
]
49+
50+
defstruct [
51+
:send_events_fn,
52+
:batch_size,
53+
:max_queue_size,
54+
:timeout,
55+
:max_batch_retries,
56+
:last_dropped_log,
57+
timeout_started_at: 0,
58+
throttle_wait: 60000,
59+
throttling: false,
60+
dropped_events: 0,
61+
queue: [],
62+
batches: :queue.new()
63+
]
64+
end
65+
66+
@spec start_link(Keyword.t()) :: GenServer.on_start()
67+
def start_link(opts \\ []) do
68+
if Honeybadger.get_env(:events_worker_enabled) do
69+
{name, opts} = Keyword.pop(opts, :name, __MODULE__)
70+
GenServer.start_link(__MODULE__, opts, name: name)
71+
else
72+
:ignore
73+
end
74+
end
75+
76+
@spec push(event :: map(), GenServer.server()) :: :ok
77+
def push(event, server \\ __MODULE__) do
78+
GenServer.cast(server, {:push, event})
79+
end
80+
81+
@spec state(GenServer.server()) :: State.t()
82+
def state(server \\ __MODULE__) do
83+
GenServer.call(server, {:state})
84+
end
85+
86+
@impl true
87+
def init(opts) do
88+
config = %{
89+
send_events_fn: Keyword.get(opts, :send_events_fn, &Honeybadger.Client.send_events/1),
90+
batch_size: Keyword.get(opts, :batch_size, Honeybadger.get_env(:events_batch_size)),
91+
timeout: Keyword.get(opts, :timeout, Honeybadger.get_env(:events_timeout)),
92+
throttle_wait:
93+
Keyword.get(opts, :throttle_wait, Honeybadger.get_env(:events_throttle_wait)),
94+
max_queue_size:
95+
Keyword.get(opts, :max_queue_size, Honeybadger.get_env(:events_max_queue_size)),
96+
max_batch_retries:
97+
Keyword.get(opts, :max_batch_retries, Honeybadger.get_env(:events_max_batch_retries)),
98+
last_dropped_log: System.monotonic_time(:millisecond)
99+
}
100+
101+
state = struct!(State, config)
102+
{:ok, state}
103+
end
104+
105+
@impl true
106+
def handle_call({:state}, _from, %State{} = state) do
107+
{:reply, state, state, current_timeout(state)}
108+
end
109+
110+
@impl true
111+
def handle_cast({:push, event}, %State{timeout_started_at: 0} = state) do
112+
handle_cast({:push, event}, reset_timeout(state))
113+
end
114+
115+
def handle_cast({:push, event}, %State{} = state) do
116+
if total_event_count(state) >= state.max_queue_size do
117+
{:noreply, %{state | dropped_events: state.dropped_events + 1}, current_timeout(state)}
118+
else
119+
queue = [event | state.queue]
120+
121+
if length(queue) >= state.batch_size do
122+
flush(%{state | queue: queue})
123+
else
124+
{:noreply, %{state | queue: queue}, current_timeout(state)}
125+
end
126+
end
127+
end
128+
129+
@impl true
130+
def handle_info(:timeout, state), do: flush(state)
131+
132+
@impl true
133+
def terminate(_reason, %State{} = state) do
134+
Logger.debug("[Honeybadger] Terminating with #{total_event_count(state)} events unsent")
135+
_ = flush(state)
136+
:ok
137+
end
138+
139+
@spec flush(State.t()) :: {:noreply, State.t(), pos_integer()}
140+
defp flush(state) do
141+
cond do
142+
state.queue == [] and :queue.is_empty(state.batches) ->
143+
# It's all empty so we stop the timeout and reset the
144+
# timeout_started_at which will restart on the next push
145+
{:noreply, %{state | timeout_started_at: 0}}
146+
147+
state.queue == [] ->
148+
attempt_send(state)
149+
150+
true ->
151+
batches = :queue.in(%{batch: Enum.reverse(state.queue), attempts: 0}, state.batches)
152+
attempt_send(%{state | queue: [], batches: batches})
153+
end
154+
end
155+
156+
@spec attempt_send(State.t()) :: {:noreply, State.t(), pos_integer()}
157+
# Sends pending batches, handling retries and throttling
158+
defp attempt_send(%State{} = state) do
159+
{new_batches_list, throttling} =
160+
Enum.reduce(:queue.to_list(state.batches), {[], false}, fn
161+
# If already throttled, skip sending and retain the batch.
162+
b, {acc, true} ->
163+
{acc ++ [b], true}
164+
165+
%{batch: batch, attempts: attempts} = b, {acc, false} ->
166+
case state.send_events_fn.(batch) do
167+
:ok ->
168+
Logger.debug("[Honeybadger] Sent batch of #{length(batch)} events.")
169+
{acc, false}
170+
171+
{:error, reason} ->
172+
throttling = reason == :throttled
173+
updated_attempts = attempts + 1
174+
175+
if throttling do
176+
Logger.warning(
177+
"[Honeybadger] Rate limited (429) events - (batch attempt #{updated_attempts}) - waiting for #{state.throttle_wait}ms"
178+
)
179+
else
180+
Logger.debug(
181+
"[Honeybadger] Failed to send events batch (attempt #{updated_attempts}): #{inspect(reason)}"
182+
)
183+
end
184+
185+
if updated_attempts < state.max_batch_retries do
186+
{acc ++ [%{b | attempts: updated_attempts}], throttling}
187+
else
188+
Logger.debug(
189+
"[Honeybadger] Dropping events batch after #{updated_attempts} attempts."
190+
)
191+
192+
{acc, throttling}
193+
end
194+
end
195+
end)
196+
197+
current_time = System.monotonic_time(:millisecond)
198+
199+
# Log dropped events if present and we haven't logged within the last
200+
# @dropped_log_interval
201+
state =
202+
if state.dropped_events > 0 and
203+
current_time - state.last_dropped_log >= @dropped_log_interval do
204+
Logger.info("[Honeybadger] Dropped #{state.dropped_events} events due to max queue limit")
205+
%{state | dropped_events: 0, last_dropped_log: current_time}
206+
else
207+
state
208+
end
209+
210+
new_state =
211+
%{state | batches: :queue.from_list(new_batches_list), throttling: throttling}
212+
|> reset_timeout()
213+
214+
{:noreply, new_state, current_timeout(new_state)}
215+
end
216+
217+
@spec total_event_count(State.t()) :: non_neg_integer()
218+
# Counts events in both the queue and pending batches.
219+
defp total_event_count(%State{batches: batches, queue: queue}) do
220+
events_count = length(queue)
221+
222+
batch_count = :queue.fold(fn %{batch: b}, acc -> acc + length(b) end, 0, batches)
223+
224+
events_count + batch_count
225+
end
226+
227+
# Returns the time remaining until the next flush
228+
defp current_timeout(%State{
229+
throttling: throttling,
230+
timeout: timeout,
231+
throttle_wait: throttle_wait,
232+
timeout_started_at: timeout_started_at
233+
}) do
234+
elapsed = System.monotonic_time(:millisecond) - timeout_started_at
235+
timeout = if throttling, do: throttle_wait, else: timeout
236+
max(1, timeout - elapsed)
237+
end
238+
239+
defp reset_timeout(state) do
240+
%{state | timeout_started_at: System.monotonic_time(:millisecond)}
241+
end
242+
end

0 commit comments

Comments
 (0)