Skip to content

Containers: improved DockerRegistryManager for tests reliability #33955

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 20, 2023

Conversation

vlada-shubina
Copy link
Member

Related to: dotnet/sdk-container-builds#469
Addresses intermittent issue: #28442 (comment)

  • added overall retries on spawning the registry
  • added "health check" calls to registry to ensure it is loaded before further setup
  • added logging

@vlada-shubina vlada-shubina added the Area-Containers Related to dotnet SDK containers functionality label Jul 12, 2023
@vlada-shubina vlada-shubina requested a review from a team as a code owner July 12, 2023 15:09
@ghost ghost added the untriaged Request triage from a team member label Jul 12, 2023
@vlada-shubina vlada-shubina force-pushed the containers-test-improvements branch from a3ad2b0 to 6877b74 Compare July 12, 2023 15:12
}

logger.LogInformation($"Retrying after {spawnRegistryDelay} ms.");
Thread.Sleep(spawnRegistryDelay);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't use the Thread class unless you're dealing with STA cases. For this particular case, consider Task.Delay(spawnRegistryDelay).ConfigureAwait(false);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please apply to all Thread.Sleep cases

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot use async calls in test class fixture init, therefore all the calls are sync.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this does not have to be an async call

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Task.Delay returns task to be awaited, there is no sync version of this method.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can call Wait on the task to synchronously wait for completion. I'm not sure if there's any material difference between that and Thread.Sleep, though, especially in test code. Curious to learn more @donJoseLuis.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather not use Wait in XUnit due to very well-known issue xunit/xunit#864
we had this so many times in dotnet/sdk repo when tests are stuck, and underlying reason is Wait somewhere deep down in the code.

Comment on lines 59 to 75
foreach (string? tag in new[] { Net6ImageTag, Net7ImageTag, Net8PreviewImageTag })
{
logger.LogInformation($"Pulling image '{BaseImageSource}{RuntimeBaseImage}:{tag}'.");
ContainerCli.PullCommand(testOutput, $"{BaseImageSource}{RuntimeBaseImage}:{tag}")
.Execute()
.Should().Pass();

logger.LogInformation($"Tagging image '{BaseImageSource}{RuntimeBaseImage}:{tag}' as '{LocalRegistry}/{RuntimeBaseImage}:{tag}'.");
ContainerCli.TagCommand(testOutput, $"{BaseImageSource}{RuntimeBaseImage}:{tag}", $"{LocalRegistry}/{RuntimeBaseImage}:{tag}")
.Execute()
.Should().Pass();

logger.LogInformation($"Pushing image '{LocalRegistry}/{RuntimeBaseImage}:{tag}'.");
ContainerCli.PushCommand(testOutput, $"{LocalRegistry}/{RuntimeBaseImage}:{tag}")
.Execute()
.Should().Pass();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious, why do we retry this entire loop and not individual iterations?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problematic interaction is push, others are local to local docker daemon.
At first, i added retries also at pushlevel, however, if the registry responses on \v2 (checked in EnsureRegistryLoaded), this means it is healthy and should be able to handle push, so the last thought is adding another retry is overreacting for now. If suggested fix won't help, we can always add retries on push.

If the registry is not healthy, it is worth to restart it - handled in the outer retry loop.

@baronfel
Copy link
Member

Quick ping for @donJoseLuis to check on the status of your requested changes - I think that Vlada and Ladi have talked through them and come to an understanding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Containers Related to dotnet SDK containers functionality untriaged Request triage from a team member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants