Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indefinite blocking and unable to receive data #15

Open
kazikame opened this issue Dec 6, 2021 · 2 comments
Open

Indefinite blocking and unable to receive data #15

kazikame opened this issue Dec 6, 2021 · 2 comments

Comments

@kazikame
Copy link

kazikame commented Dec 6, 2021

Expected Behavior

TAS should send/receive packets without waiting indefinitely

Current Behavior

TAS sometimes fails to send (or receive) data sent by the last call to send(). This causes the receiver to wait indefinitely even after the sender has stopped.

Steps to Reproduce

The bug is non-deterministic and may happen at the server or the client, however it can be reproduced fairly reliably using the following server and client in this repo

  1. Compile using -Ofast and -march=native
  2. Run TAS on both the server and client
  3. Server:
    LD_PRELOAD=<path-to-libtas_interpose.so>  ./server <server-ip> <server-port
  4. Client:
    LD_PRELOAD=<path-to-libtas_interpose.so>  ./client<server-ip> <server-port

Context (Environment)

The bug was discovered when this software RDMA stack was attempted over TAS on two machines equipped with the 10G Intel 82599 NICs. All performance benchmarks get blocked indefinitely on TAS. They run fine on the regular kernel TCP stack.

@PabstMatthew
Copy link

One thing we've also noticed about this issue is that decreasing --fp-poll-interval-tas seems to make the success probability decrease.

@rajathshashidhara
Copy link
Member

TAS is designed for long running applications -- it has an implicit assumption that applications indefinitely poll the network stack for updates. The sample code you've shared makes a fixed number of socket calls. Due to this, the client fails to propagate the last transmission update to its fastpath causing the server to infinitely block on recv(). This liveness condition is easy to satisfy -- ensure that your code polls on flextcp_context_poll() indefinitely. Note that the sockets API layer in TAS relies on this function to propagate updates to the fastpath.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants