Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extending keyserver timeout to make alpine build more forgiving #3520

Merged

Conversation

adamfarley
Copy link
Contributor

Right now, about 10% of alpine linux x64 builds fail due to timeouts when requesting keys from the keyserver.

Example: https://ci.adoptium.net/job/build-scripts/job/jobs/job/jdk11u/job/jdk11u-alpine-linux-x64-temurin/277

The hope is that these timeouts are just temporary slowness due to server load, and that an extended timeout is the correct fix.

Test run: https://ci.adoptium.net/job/build-scripts/job/jobs/job/jdk11u/job/jdk11u-alpine-linux-x64-temurin/279/

Right now, about 10% of alpine linux x64 builds fail due to
timeouts when requesting keys from the keyserver.

The hope is that these timeouts are just temporary slowness due
to server load, and that an extended timeout is the correct fix.

Signed-off-by: Adam Farley <[email protected]>
@adamfarley adamfarley self-assigned this Nov 3, 2023
@github-actions github-actions bot added the alpine-linux Issues that affect or relate to the Alpine LINUX OS label Nov 3, 2023
@adamfarley
Copy link
Contributor Author

This PR may fix this issue: #3518

@adamfarley adamfarley enabled auto-merge (squash) November 3, 2023 14:05
And I'm asking the original committer for a review, to make sure
this isn't a deliberate duplication.

Signed-off-by: Adam Farley <[email protected]>
@github-actions github-actions bot added alpine-linux Issues that affect or relate to the Alpine LINUX OS and removed alpine-linux Issues that affect or relate to the Alpine LINUX OS labels Nov 3, 2023
@adamfarley
Copy link
Contributor Author

I noticed that the relevant code is duplicated. I don't see why this should be needed, so I'm removing it and tagging the original comitter to make sure it's not duplicated for a reason.

Also, I added a temporary echo cmd "Proof of timeout change." to make absolutely sure I'm modifying the right code here. I think I am, because it's preceded by the only instance of .gpg-temp creation that's followed by a pid ($$), which the output indicates is the thing that gets created prior to failure.

@adamfarley
Copy link
Contributor Author

Ok, we're definitely modifying the right code. Removing the echo cmd now.

https://ci.adoptium.net/job/build-scripts/job/jobs/job/jdk11u/job/jdk11u-alpine-linux-x64-temurin/280/consoleFull

@adamfarley adamfarley requested a review from sxa November 3, 2023 14:44
Signed-off-by: Adam Farley <[email protected]>
@github-actions github-actions bot added alpine-linux Issues that affect or relate to the Alpine LINUX OS and removed alpine-linux Issues that affect or relate to the Alpine LINUX OS labels Nov 3, 2023
@adamfarley adamfarley enabled auto-merge (squash) November 3, 2023 14:49
Copy link
Member

@sxa sxa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For clarification and historic reference we're talking about an issue with the GPG signature check process on the ALSA download which is giving this error, right?

18:05:42  GNUPGHOME=/tmp/.gpg-temp.194
18:05:42  gpg: keybox '/tmp/.gpg-temp.194/pubring.kbx' created
18:06:51  gpg: keyserver receive failed: Operation timed out

I'm somewhat surprised that that's timing out and I worry it's part of a more severe problem - if it hasn't git something after 50 seconds I'm not sure it'll get it after a longer time. How frequently has this occurred?

EDIT: Hmmm https://ci.adoptium.net/job/build-scripts/job/jobs/job/jdk11u/job/jdk11u-alpine-linux-x64-temurin/278/consoleFull seems to show it taking about 50 seconds - I wonder what's making it so slow ...

18:05:51  gpg: keybox '/tmp/.gpg-temp.192/pubring.kbx' created
18:06:40  gpg: /tmp/.gpg-temp.192/trustdb.gpg: trustdb created

@sxa
Copy link
Member

sxa commented Nov 3, 2023

Based on #3518 (comment) could you add an uptime command into the script (probably before the gpg receive so the line before where you had the echo) so we can have a few runs of this that we can analyse later to see if the machine is getting overloaded at times (Probably worth putting in a comment before the uptime call pointing at this issue/PR with a reminder to remove it later :-)

Copy link
Member

@sxa sxa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that the relevant code is duplicated. I don't see why this should be needed, so I'm removing it and tagging the original comitter to make sure it's not duplicated for a reason.

Since you asked me directly elsewhere I conform that yeah the de-duplication of this SGTM 👍🏻

Copy link
Contributor

@andrew-m-leonard andrew-m-leonard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@adamfarley
Copy link
Contributor Author

...which is giving this error, right?

18:06:51  gpg: keyserver receive failed: Operation timed out

Yup, that's the one.

How frequently has this occurred?

Around 10% of the time. No apparent pattern.

...could you add an uptime command into the script (probably before the gpg receive so the line before where you had the echo)

This is slightly unclear, as the line where I had the "echo" is after the gpg recieve (gpg ... --recv-keys ...).

I'm putting the uptime command into the script before the recv command. Do let me know if I've read that wrong.

@sxa
Copy link
Member

sxa commented Nov 9, 2023

I'm putting the uptime command into the script before the recv command. Do let me know if I've read that wrong.

Yeah that's fine with me :-)

@github-actions github-actions bot added alpine-linux Issues that affect or relate to the Alpine LINUX OS and removed alpine-linux Issues that affect or relate to the Alpine LINUX OS labels Nov 9, 2023
@adamfarley adamfarley merged commit ff9bcd1 into adoptium:master Nov 9, 2023
23 checks passed
@karianna karianna mentioned this pull request Jan 8, 2024
@adamfarley adamfarley deleted the extend_timeout_for_keyserver_actions branch July 10, 2024 14:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
alpine-linux Issues that affect or relate to the Alpine LINUX OS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants