Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SignalRClient.start() may get stuck on keepalive timeout and closed TCP connection #628

Open
theOehrly opened this issue Aug 20, 2024 Discussed in #627 · 2 comments
Open
Labels
bug Something isn't working

Comments

@theOehrly
Copy link
Owner

Discussed in #627

Originally posted by supremeremixx August 19, 2024
Hello!
I'm trying to run endless loop with SignalRClient.

import time
import logging
from fastf1.livetiming.client import SignalRClient

log = logging.getLogger()
log.setLevel(logging.DEBUG)

def signalr():
  client = SignalRClient(filename="output.txt", timeout=1800, debug=True)
  client.start()

while True:
  signalr()
  time.sleep(60)

After random amount of time (3, 5 , 11 hours...) I get keepalive timeout error. In logs I see that TCP connection is closed, but function client.start() still blocked, it never ends. Loop is never finished and that's why program not creating a new client instance after error.

python3[282709]: 2024-08-09 21:36:44,532 - DEBUG: ! timed out waiting for keepalive pong
python3[282709]: 2024-08-09 21:36:44,533 - DEBUG: ! failing connection with code 1011
python3[282709]: 2024-08-09 21:36:44,535 - DEBUG: = connection is CLOSING
python3[282709]: 2024-08-09 21:36:44,535 - DEBUG: > CLOSE 1011 (internal error) keepalive ping timeout [24 bytes]
python3[282709]: 2024-08-09 21:36:54,536 - DEBUG: ! timed out waiting for TCP close
python3[282709]: 2024-08-09 21:36:54,536 - DEBUG: x closing TCP connection
python3[282709]: 2024-08-09 21:37:04,539 - DEBUG: ! timed out waiting for TCP close
python3[282709]: 2024-08-09 21:37:04,539 - DEBUG: x aborting TCP connection
python3[282709]: 2024-08-09 21:37:04,540 - DEBUG: = connection is CLOSED
python3[282709]: 2024-08-09 21:59:44,619 - WARNING: Timeout - received no data for more than 1800 seconds!

I've tried running it with and without timeout (in SignalRClient), but the function remains locked.
Is it possible to somehow terminate it after receiving an error?

@theOehrly theOehrly added the bug Something isn't working label Aug 20, 2024
@supremeremixx
Copy link

Hello @theOehrly !
I made a little research so maybe this information will help you to fix this bug.

When I tried to run SignalRClient with timeout, after receiving an error and the timeout time has expired, the function client.start() was stucked in loop while self._connection.started in _supervise (livetiming->client.py).

    async def _supervise(self):
        self._t_last_message = time.time()
        while True:
            if (self.timeout != 0
                    and time.time() - self._t_last_message > self.timeout):
                self.logger.warning(f"Timeout - received no data for more "
                                    f"than {self.timeout} seconds!")

                self._connection.close()
                while self._connection.started:
                    await asyncio.sleep(0.1)
                return

            await asyncio.sleep(1)

Apparently, since the connection has already been closed due to the error that occurred, the self._connection.close() function does not set the self._connection.started flag to False. For testing I manually added self._connection.started = False after close() function and SignalRClient was shutted down!

For another example, you can add these lines to the received message handler. If only the self._connection.close() function is there, I can see in the logs how the connection is closed, but the function remains in a blocking state (in fact it is a simulation of an error occurrence), but if you add self._connection.started = False after that, the function is terminated.

In normal conditions after self._connection.close() is called it trying to call self.__transport.close() (signalr_aio->transports->_transport.py) which is sents CloseEvent() to ws_loop.

def close(self):
    asyncio.Task(self.invoke_queue.put(CloseEvent()), loop=self.ws_loop)

And when _producer_handler receive this event it trying to close websocket connection and set self._connection.started = False (essentially the same behavior as in _supervise)

  async def _producer_handler(self, ws):
      while True:
          try:
              event = await self.invoke_queue.get()
              if event is not None:
                  if event.type == 'INVOKE':
                      await ws.send(dumps(event.message))
                  elif event.type == 'CLOSE':
                      await ws.close()
                      while ws.open is True:
                          await asyncio.sleep(0.1)
                      else:
                          self._connection.started = False
                          break
              else:
                  break
              self.invoke_queue.task_done()
          except Exception as e:
              raise e

So as I understand this bug can be fixed by adding self._connection.started = False to some error handler of websocket connection. I tried to find it myself, but couldn't. That is why I decided to detail my observations for you.

Hope it helps!

@theOehrly
Copy link
Owner Author

@supremeremixx thank you for investigating this more. This is helpful information, for sure. Maybe I'll be able to fix it soon.

Also, it would probably be best to raise an exception here instead of simply returning from the function. After all, this is not an intended or expected way to close the connection, but rather it is some sort of connection error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants