Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconnect to socket on exception #186

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

VladYermakov
Copy link

@VladYermakov VladYermakov commented May 30, 2024

Hi there, we're using a fork of your library at Lyft, and some time ago bumped into the issue with TCP socket, describing the issue below:

Turns out TCP socket connection could close after some time and current setup does not handle that well.

Traceback (most recent call last):
  File "/code/etaonlinelearner/models/main/unified/eta_model_trainer_unified.py", line 305, in train
    model = self._train(start_time)
  File "/code/etaonlinelearner/models/main/unified/eta_model_trainer_unified.py", line 419, in _train
    region_allow_list = self.metrics_helper.calculate_test_accuracy_metrics(
  File "/code/etaonlinelearner/models/helpers/metrics_helper.py", line 189, in calculate_test_accuracy_metrics
    regional_accuracy_metrics = self._calculate_regional_accuracy_metrics(
  File "/code/etaonlinelearner/models/helpers/metrics_helper.py", line 313, in _calculate_regional_accuracy_metrics
    return combined.groupby(self.region).apply(
  File "/code/venvs/venv/lib/python3.8/site-packages/pandas/core/groupby/groupby.py", line 1567, in apply
    result = self._python_apply_general(f, self._selected_obj)
  File "/code/venvs/venv/lib/python3.8/site-packages/pandas/core/groupby/groupby.py", line 1629, in _python_apply_general
    values, mutated = self.grouper.apply(f, data, self.axis)
  File "/code/venvs/venv/lib/python3.8/site-packages/pandas/core/groupby/ops.py", line 839, in apply
    res = f(group)
  File "/code/etaonlinelearner/models/helpers/metrics_helper.py", line 326, in <lambda>
    self.mae_improvement: self._mae_improvement(
  File "/code/etaonlinelearner/models/helpers/metrics_helper.py", line 572, in _mae_improvement
    self._log_accuracy_metrics(f"learner_{stats_prefix}", mae_learner, region)
  File "/code/etaonlinelearner/models/helpers/metrics_helper.py", line 453, in _log_accuracy_metrics
    self.statsd.incr(f"{stats_prefix}_counter", metric, tags=tags)
  File "/code/venvs/venv/lib/python3.8/site-packages/lyft_stats/stats.py", line 74, in incr
    self._client.incr(self._p(stat), count, rate, tags=self._process_tags(tags, per_host))
  File "/code/venvs/venv/lib/python3.8/site-packages/lyft_stats/client.py", line 135, in incr
    self._send_stat(stat, '%s|c' % count, rate, tags)
  File "/code/venvs/venv/lib/python3.8/site-packages/lyft_stats/client.py", line 159, in _send_stat
    self._after(self._prepare(stat, value, rate, tags))
  File "/code/venvs/venv/lib/python3.8/site-packages/lyft_stats/client.py", line 177, in _after
    self._send(data)
  File "/code/venvs/venv/lib/python3.8/site-packages/lyft_stats/client.py", line 246, in _send
    self._do_send(data)
  File "/code/venvs/venv/lib/python3.8/site-packages/lyft_stats/client.py", line 251, in _do_send
    self._sock.sendall(data.encode('ascii') + b'\n')
TimeoutError: [Errno 110] Connection timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/code/etaonlinelearner/models/main/unified/eta_model_trainer_unified.py", line 311, in train
    statsd.incr("training_failed", tags=self.tags)
  File "/code/venvs/venv/lib/python3.8/site-packages/lyft_stats/stats.py", line 74, in incr
    self._client.incr(self._p(stat), count, rate, tags=self._process_tags(tags, per_host))
  File "/code/venvs/venv/lib/python3.8/site-packages/lyft_stats/client.py", line 135, in incr
    self._send_stat(stat, '%s|c' % count, rate, tags)
  File "/code/venvs/venv/lib/python3.8/site-packages/lyft_stats/client.py", line 159, in _send_stat
    self._after(self._prepare(stat, value, rate, tags))
  File "/code/venvs/venv/lib/python3.8/site-packages/lyft_stats/client.py", line 177, in _after
    self._send(data)
  File "/code/venvs/venv/lib/python3.8/site-packages/lyft_stats/client.py", line 246, in _send
    self._do_send(data)
  File "/code/venvs/venv/lib/python3.8/site-packages/lyft_stats/client.py", line 251, in _do_send
    self._sock.sendall(data.encode('ascii') + b'\n')
BrokenPipeError: [Errno 32] Broken pipe

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/code/venvs/venv/lib/python3.8/site-packages/lyftlearnclient/build_tools/model_image_tools.py", line 392, in train_with_model_hyperparameters
    trained_model = model.train()
  File "/code/etaonlinelearner/models/main/unified/eta_model_trainer_unified.py", line 315, in train
    statsd.timing(
  File "/code/venvs/venv/lib/python3.8/site-packages/lyft_stats/stats.py", line 70, in timing
    self._client.timing(self._p(stat), delta, rate, tags=self._process_tags(tags, per_host))
  File "/code/venvs/venv/lib/python3.8/site-packages/lyft_stats/client.py", line 131, in timing
    self._send_stat(stat, '%0.6f|ms' % delta, rate, tags)
  File "/code/venvs/venv/lib/python3.8/site-packages/lyft_stats/client.py", line 159, in _send_stat
    self._after(self._prepare(stat, value, rate, tags))
  File "/code/venvs/venv/lib/python3.8/site-packages/lyft_stats/client.py", line 177, in _after
    self._send(data)
  File "/code/venvs/venv/lib/python3.8/site-packages/lyft_stats/client.py", line 246, in _send
    self._do_send(data)
  File "/code/venvs/venv/lib/python3.8/site-packages/lyft_stats/client.py", line 251, in _do_send
    self._sock.sendall(data.encode('ascii') + b'\n')
BrokenPipeError: [Errno 32] Broken pipe

Proposed solution is reconnecting to socket on exception

@VladYermakov VladYermakov marked this pull request as ready for review October 2, 2024 08:48
@VladYermakov
Copy link
Author

Hi @jsocol , don't see an option to tag you as a reviewer, so mentioning you here
Please take a look at the above issue and proposed solution. We at Lyft use it for a while now and it fixed the issue for us, so I wanted to populate it in your library for fix to be available for everyone
Thanks in advance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant