Skip to content

Fix TestOnStartWithArgsThenStop and make service tests more reliable #39153

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 15, 2020

Conversation

danmoseley
Copy link
Member

@danmoseley danmoseley commented Jul 12, 2020

Fixes #38945

There were two bugs

  1. Test service does not mutually synchronize connect and start messages, but one of the test expects them to be ordered. This is the cause of the failure. I experimented with making the start message write a continuation of the connect message write, but this problem only occurs for these two messages. All other messages can be handled sequentially, because the test can wait on the matching SCM status. The simplest approach is simply to recognize these messages are unordered and allow either ordering in the test.

  2. Test service did not wait for the client to connect before writing to the pipe. This was because when test services are being torn down, the test service installer will issue the Stop command to the service via the SCM, which will cause the test service to write a stop message to the pipe, which no longer has a client planning to connect to it, so in a previous change we stopped waiting on the connection, introducing flakiness. Fix: wait on the client connection, unless the message is a stop message. (Tests that may wait on a stop message all wait on some previous message before it, so the test service will always have a pipe when it writes to such tests.)

Using a debugger to figure out what is happening is painful because of the various threads and processes. Tracing is far more convenient and useful. I added a bunch of tracing to the test code. With luck this is the last timing problem in these tests, but we've had a series of such problems so I've left the tracing in for next time, disabled by default.

Also added a comment about Dispose, which was confusing.

@danmoseley danmoseley requested a review from stephentoub July 13, 2020 22:35
@MattGal
Copy link
Member

MattGal commented Jul 13, 2020

@danmosemsft ah I see "upload" in this context means AzDO reporting and XUnit reporting (which is done in the context of the workitem) and it indeed seems to have hung inside the Azure Python SDK. I'll create an issue for this on the arcade backlog; if you see a bunch more please let me know and we can bump priority.

If you're not actively using XUnit result ingestion in Kusto, you could remove usage of xunit-reporter.py entirely by flipping a build property bool; let me know if you want to talk about that.

@danmoseley
Copy link
Member Author

Thanks @MattGal

If you're not actively using XUnit result ingestion in Kusto, you could remove usage of xunit-reporter.py entirely by flipping a build property bool; let me know if you want to talk about that.

@ViktorHofer can you help answer that?

@ViktorHofer
Copy link
Member

I know that none of our tools rely on it but maybe @wfurt's dashboard?

@MattGal
Copy link
Member

MattGal commented Jul 14, 2020

I also expect, while this is a seeming bug in the azure-storage-blob python SDK, this specific failure is going to be pretty rare. Either way, if you actively use and want the Kusto XUnit support I'd appreciate a quick comment in dotnet/arcade#5786 for when we discuss it at Thursday's triage.

@wfurt
Copy link
Member

wfurt commented Jul 14, 2020

I may not understand the comment properly @MattGal. I don't think we care about the reporter but we do use test results from Kusto. For dashboard as well as @alnikola put together process to monitor and analyze test failures so we can stay on top of them. This is also handy for trends and closing old test failures.
cc: @karelz

@MattGal
Copy link
Member

MattGal commented Jul 14, 2020

I may not understand the comment properly @MattGal. I don't think we care about the reporter but we do use test results from Kusto. For dashboard as well as @alnikola put together process to monitor and analyze test failures so we can stay on top of them. This is also handy for trends and closing old test failures.
cc: @karelz

If you like Xunit Facts in your kusto, you like xunit-reporter.py. Some folks on our team less enthusiastic about it because it can be quite expensive to put that much into Kusto. If you just care about work items and their exit codes, you don't care about the reporter. Ping me on Teams or a quick call if you want to go deeper.

@danmoseley
Copy link
Member Author

ping @Anipik

@danmoseley danmoseley merged commit ee41d4a into dotnet:master Jul 15, 2020
@danmoseley danmoseley deleted the svctests branch July 15, 2020 00:33
@karelz karelz added this to the 5.0.0 milestone Aug 18, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 8, 2020
@danmoseley danmoseley restored the svctests branch December 22, 2020 05:07
@danmoseley danmoseley deleted the svctests branch September 30, 2022 16:36
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
7 participants