Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing sconf not always propagated #346

Open
dotsimon opened this issue Oct 30, 2018 · 4 comments
Open

Changing sconf not always propagated #346

dotsimon opened this issue Oct 30, 2018 · 4 comments

Comments

@dotsimon
Copy link

When changing sconf, acceptors that are neither Ready nor Last can continue to receive and process requests using the old sconf.

If appmods and/or opaque are changed then this could lead to quite incorrect responses. On busy servers, these old acceptors might never get killed by the periodic cleanup.

https://github.com/dotsimon/yaws/tree/sconf_race suggests a solution that preserves the "soft-close" but should ultimately terminate those old acceptors.

@vinoski
Copy link
Collaborator

vinoski commented Nov 1, 2018

Thanks, I'll take a look.

@vinoski
Copy link
Collaborator

vinoski commented Nov 1, 2018

How do we duplicate the issue? How do we test the suggested fix?

@dotsimon
Copy link
Author

dotsimon commented Nov 1, 2018

Hi Steve. Sorry for not including test cases, but this was found in a regression test where yaws is only a small part.
But consider this case where an embedded server is started with one appmod: a.
The gserv process starts an acceptor (acc1), therefore in gserv_loop Last = acc1 and Ready = []
A request to /a/... is made. acc1 sends next to gserv which starts a new acceptor (acc2) such that now Last = acc1 and Ready = []
acc1 calls a:out/1 which in turn updates the sconf to add appmod b
gserv calls stop_ready which kills acc2 and then starts a new acceptor acc3
acc1 eventually finishes the request and sends done_client to gserv which adds the pid to Ready.
A request now to /b/... gets handled by acc3 ok
The next request to /b/ gets 404:ed by acc1 because it still has the old sconf

A variation on this is that accX has already served a request and is in gen_tcp:accept and in gserv's Ready, So in the original code accX still has trap_exit true making the exit in stop_ready ineffective and accX can therefore serve one more request (with the old sconf) before actually dying.

A worse case is like the above but accX is Last eventually dying due to a previous queued EXIT leaving gserv in a state where it thinks accX is waiting in accept but actually there are no processes to serve any requests!

My fixes were:

  1. move trap_exit so that EXIT during gen_tcp:accept would actually kill the process
  2. ignore done_client from acceptors that had been previously unlinked in stop_ready
  3. in stop_ready also handle the clients processing requests

In doing this, I did consider if having a client list instead of relying on process_info(links) would be better. And also if there was a point to sending Pid ! {Top, stop} when the exit(shutdown, Pid) had to be handled in the same way anyway in acceptor0 and aloop.

But, since I don't have a standalone yaws or (sadly) the time to develop test cases I did not want to make any significant reworking. For the same reasons, I did not make a PR from the sconf_race branch.

I hope this has been somewhat helpful. Please let me know if I can help clarify anything.

/Simon

@vinoski
Copy link
Collaborator

vinoski commented Nov 1, 2018

Thanks for the details, Simon. This information looks like it will be helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants