Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent test failures due to sockets remaining open on zmq adapter #162

Open
abbiemery opened this issue Jul 19, 2023 · 2 comments
Open
Assignees
Labels
bug Something isn't working

Comments

@abbiemery
Copy link
Collaborator

I've noticed a few times now that there are sporadic failures of CI jobs in the pytest stage due to socket issues. It is hard to replicate since there is no obvious decernable pattern to when they fail, but this needs investigating.

@rosesyrett rosesyrett self-assigned this Jul 19, 2023
@rosesyrett
Copy link
Contributor

I have investigated this, and I've narrowed down the cause of this issue. Importantly, I've found a fix for it:

The problem

This stems from the following test:

@pytest.mark.asyncio
async def test_socket_cleaned_up_on_cancel(
    mock_device: Device,
    mock_raise_interrupt: RaiseInterrupt,
) -> None:
    adapter_a = ZeroMqPushAdapter()
    adapter_b = ZeroMqPushAdapter()
    for adapter in (adapter_a, adapter_b):
        task = asyncio.create_task(
            adapter.run_forever(
                mock_device,
                mock_raise_interrupt,
            )
        )
        await adapter.send_message([b"test"])
        task.cancel()
        try:
            await task
        except asyncio.CancelledError:
            pass
        assert task.done()

What seems to happen is, occasionally when the for loop gets to the await adapter.send_message([]) line for the 2nd time, the socket from the previous run of the loop hasn't been shut down properly.

The fix

I've found threads like aio-libs/aiozmq#72 (comment) and https://stackoverflow.com/questions/45805714/how-to-close-properly-with-aiozmq-zmq, which led me to find out about zmq's LINGER property, documented (here)[http://api.zeromq.org/2-1%3azmq-setsockopt#toc15]. This suggests that by default, when zmq connections are made and a message is sent (but hasn't been recieved by the peer), the message will just linger - to me this suggests that the connection will not fully close.

I've tried to fix this by doing:

async def create_zmq_push_socket(host: str, port: int) -> aiozmq.ZmqStream:
    addr = f"tcp://{host}:{port}"
    stream = await aiozmq.create_zmq_stream(zmq.PUSH, connect=addr, bind=addr)
    transport = cast(aiozmq.ZmqTransport, stream.transport)
    transport.setsockopt(zmq.LINGER, 0)

    return stream

But that hasn't seemed to help anything. So, I found another fix; aborting the stream transport. i.e. force killing it. Maybe in future we can investigate exactly why this is happening but... it's really quite annoying to debug aiozmq library as it's not consistently typed.

@rosesyrett
Copy link
Contributor

After much deliberation, I've decided the fix to this is to move from aiozmq to pyzmq, as the latter is supported for later versions of python and has had successful CI jobs for recent releases (https://github.com/zeromq/pyzmq/tree/v25.0.2)

@abbiemery abbiemery added the bug Something isn't working label Sep 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants