Changing sconf not always propagated #346

dotsimon · 2018-10-30T23:37:35Z

When changing sconf, acceptors that are neither Ready nor Last can continue to receive and process requests using the old sconf.

If appmods and/or opaque are changed then this could lead to quite incorrect responses. On busy servers, these old acceptors might never get killed by the periodic cleanup.

https://github.com/dotsimon/yaws/tree/sconf_race suggests a solution that preserves the "soft-close" but should ultimately terminate those old acceptors.

vinoski · 2018-11-01T01:47:35Z

Thanks, I'll take a look.

vinoski · 2018-11-01T02:22:51Z

How do we duplicate the issue? How do we test the suggested fix?

dotsimon · 2018-11-01T21:14:06Z

Hi Steve. Sorry for not including test cases, but this was found in a regression test where yaws is only a small part.
But consider this case where an embedded server is started with one appmod: a.
The gserv process starts an acceptor (acc1), therefore in gserv_loop Last = acc1 and Ready = []
A request to /a/... is made. acc1 sends next to gserv which starts a new acceptor (acc2) such that now Last = acc1 and Ready = []
acc1 calls a:out/1 which in turn updates the sconf to add appmod b
gserv calls stop_ready which kills acc2 and then starts a new acceptor acc3
acc1 eventually finishes the request and sends done_client to gserv which adds the pid to Ready.
A request now to /b/... gets handled by acc3 ok
The next request to /b/ gets 404:ed by acc1 because it still has the old sconf

A variation on this is that accX has already served a request and is in gen_tcp:accept and in gserv's Ready, So in the original code accX still has trap_exit true making the exit in stop_ready ineffective and accX can therefore serve one more request (with the old sconf) before actually dying.

A worse case is like the above but accX is Last eventually dying due to a previous queued EXIT leaving gserv in a state where it thinks accX is waiting in accept but actually there are no processes to serve any requests!

My fixes were:

move trap_exit so that EXIT during gen_tcp:accept would actually kill the process
ignore done_client from acceptors that had been previously unlinked in stop_ready
in stop_ready also handle the clients processing requests

In doing this, I did consider if having a client list instead of relying on process_info(links) would be better. And also if there was a point to sending Pid ! {Top, stop} when the exit(shutdown, Pid) had to be handled in the same way anyway in acceptor0 and aloop.

But, since I don't have a standalone yaws or (sadly) the time to develop test cases I did not want to make any significant reworking. For the same reasons, I did not make a PR from the sconf_race branch.

I hope this has been somewhat helpful. Please let me know if I can help clarify anything.

/Simon

vinoski · 2018-11-01T23:17:09Z

Thanks for the details, Simon. This information looks like it will be helpful.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changing sconf not always propagated #346

Changing sconf not always propagated #346

dotsimon commented Oct 30, 2018

vinoski commented Nov 1, 2018

vinoski commented Nov 1, 2018

dotsimon commented Nov 1, 2018

vinoski commented Nov 1, 2018

Changing sconf not always propagated #346

Changing sconf not always propagated #346

Comments

dotsimon commented Oct 30, 2018

vinoski commented Nov 1, 2018

vinoski commented Nov 1, 2018

dotsimon commented Nov 1, 2018

vinoski commented Nov 1, 2018