CentOS8: Errors while making cache if GPG keys change #67

pfuntner · 2020-02-05T14:47:38Z

I'm seeing an error using GCP CentOS8 instances for my master and worker nodes:

TASK [geerlingguy.kubernetes : Make cache if Kubernetes GPG key changed.] *******************************************************************************************************************************************************************
Wednesday 05 February 2020  14:44:51 +0000 (0:00:01.290)       0:03:34.103 ****
fatal: [54.161.207.59]: FAILED! => {"changed": true, "cmd": ["yum", "-q", "makecache", "-y", "--disablerepo=*", "--enablerepo=kubernetes"], "delta": "0:00:00.726136", "end": "2020-02-05 14:44:56.576533", "msg": "non-zero return code", "rc": -13, "start": "2020-02-05 14:44:55.850397", "stderr": "Importing GPG key 0xA7317B0F:\n Userid     : \"Google Cloud Packages Automatic Signing Key <[email protected]>\"\n Fingerprint: D0BC 747F D8CA F711 7500 D6FA 3746 C208 A731 7B0F\n From       : https://packages.cloud.google.com/yum/doc/yum-key.gpg", "stderr_lines": ["Importing GPG key 0xA7317B0F:", " Userid     : \"Google Cloud Packages Automatic Signing Key <[email protected]>\"", " Fingerprint: D0BC 747F D8CA F711 7500 D6FA 3746 C208 A731 7B0F", " From       : https://packages.cloud.google.com/yum/doc/yum-key.gpg"], "stdout": "", "stdout_lines": []}
...

When I ssh into one of the instances and run the command directly, it seems to work ok:

[root@ip-172-31-51-134 ~]# yum -q makecache -y --disablerepo=\* --enablerepo=kubernetes
Importing GPG key 0xBA07F4FB:
 Userid     : "Google Cloud Packages Automatic Signing Key <[email protected]>"
 Fingerprint: 54A6 47F9 048D 5688 D7DA 2ABE 6A03 0B21 BA07 F4FB
 From       : https://packages.cloud.google.com/yum/doc/yum-key.gpg
Importing GPG key 0x3E1BA8D5:
 Userid     : "Google Cloud Packages RPM Signing Key <[email protected]>"
 Fingerprint: 3749 E1BA 95A8 6CE0 5454 6ED2 F09C 394C 3E1B A8D5
 From       : https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
[root@ip-172-31-51-134 ~]# echo $?
0

The GCP keys and fingerprints are different are different from the failure but I don't know what the significance is. If I start over from scratch with new instances, it fails at the same point with the same key and fingerprint from the failure.

The text was updated successfully, but these errors were encountered:

timmay75 · 2020-03-05T19:07:12Z

Hi @pfuntner, I have the same issue and its blocking me from using this on Centos 8. Have you found any way around it? Im pretty new to this deep usage of Ansible and im not sure if there is a way to not run this task. I tried to do the gpg in a pretask but that doesnt seem to work either. Any new would be greatly appreciated.

geerlingguy · 2020-05-07T15:55:48Z

I'm also seeing this now, only on CentOS 8 builds, in Travis CI:

Importing GPG key 0xA7317B0F:
 Userid     : "Google Cloud Packages Automatic Signing Key <[email protected]>"
 Fingerprint: D0BC 747F D8CA F711 7500 D6FA 3746 C208 A731 7B0F
 From       : https://packages.cloud.google.com/yum/doc/yum-key.gpg
Importing GPG key 0xBA07F4FB:
 Userid     : "Google Cloud Packages Automatic Signing Key <[email protected]>"
 Fingerprint: 54A6 47F9 048D 5688 D7DA 2ABE 6A03 0B21 BA07 F4FB
 From       : https://packages.cloud.google.com/yum/doc/yum-key.gpg"

On the task geerlingguy.kubernetes : Make cache if Kubernetes GPG key changed..

This started happening sometime between April 22 and April 30, according to cron-triggered CI builds: https://travis-ci.org/github/geerlingguy/ansible-role-kubernetes/builds

geerlingguy · 2020-05-07T16:00:42Z

Weird... if I run it locally, it passes. Exact same CI test.

I'm going to re-run the last failed build and see if maybe it's something in the Travis CI environment?

geerlingguy · 2020-05-07T16:16:18Z

I think this could possibly be related to the test container. Locally I'm running a 3-month-old image pulled from Docker Hub, while the latest is from 17 days ago...

And in the centos8 image build CI task on Travis CI, I'm seeing the error (https://travis-ci.com/github/geerlingguy/docker-centos8-ansible/jobs/326423010#L631):

Error: GPG check FAILED

Going to debug there.

geerlingguy · 2020-05-07T17:13:38Z

After updating to the latest version of the centos8 image, which seems to have the initial GPG key issue fixed, I'm getting:

--> Action: 'idempotence'
614ERROR: Idempotence test failed because of the following tasks:
615* [instance] => geerlingguy.docker : Add Docker GPG key.
616* [instance] => geerlingguy.kubernetes : Add Kubernetes GPG keys.
617* [instance] => geerlingguy.kubernetes : Add Kubernetes GPG keys.
618* [instance] => geerlingguy.kubernetes : Make cache if Kubernetes GPG key changed.

See failed build: https://travis-ci.org/github/geerlingguy/ansible-role-kubernetes/jobs/683881462

So it seems something's amiss with keys in yum in CentOS 8, but only on Travis CI in my case (and it sounds like also on @pfuntner's servers).

@pfuntner / @timmay75 - What kind of servers/instances are you deploying against?

geerlingguy · 2020-05-07T17:17:54Z

Actually, now I'm able to reproduce the issue locally:

--> Action: 'idempotence'
ERROR: Idempotence test failed because of the following tasks:
* [instance] => geerlingguy.docker : Add Docker GPG key.
* [instance] => geerlingguy.kubernetes : Add Kubernetes GPG keys.
* [instance] => geerlingguy.kubernetes : Add Kubernetes GPG keys.
* [instance] => geerlingguy.kubernetes : Make cache if Kubernetes GPG key changed.

geerlingguy · 2020-05-07T17:38:48Z

After running the playbook a number of times, I see the keys just keep importing over and over again:

# rpm -qi gpg-pubkey-\* | grep -E ^Packager
error: rpmdbNextIterator: skipping h#     173 blob size(4836): BAD, 8 + 16 * il(70) + dl(3708)
error: rpmdb: damaged header #173 retrieved -- skipping.
error: rpmdb: damaged header #173 retrieved -- skipping.
Packager    : Docker Release (CE rpm) <[email protected]>
Packager    : Google Cloud Packages Automatic Signing Key <[email protected]>
Packager    : Google Cloud Packages Automatic Signing Key <[email protected]>
Packager    : Google Cloud Packages RPM Signing Key <[email protected]>
Packager    : Docker Release (CE rpm) <[email protected]>
Packager    : Google Cloud Packages Automatic Signing Key <[email protected]>
Packager    : Google Cloud Packages Automatic Signing Key <[email protected]>
Packager    : Google Cloud Packages RPM Signing Key <[email protected]>
Packager    : Docker Release (CE rpm) <[email protected]>
Packager    : Google Cloud Packages Automatic Signing Key <[email protected]>
Packager    : Google Cloud Packages Automatic Signing Key <[email protected]>
Packager    : Google Cloud Packages RPM Signing Key <[email protected]>
Packager    : Docker Release (CE rpm) <[email protected]>
Packager    : Google Cloud Packages Automatic Signing Key <[email protected]>
Packager    : Google Cloud Packages Automatic Signing Key <[email protected]>
Packager    : Google Cloud Packages RPM Signing Key <[email protected]>
Packager    : Docker Release (CE rpm) <[email protected]>
Packager    : Google Cloud Packages Automatic Signing Key <[email protected]>
Packager    : Google Cloud Packages Automatic Signing Key <[email protected]>
Packager    : Google Cloud Packages RPM Signing Key <[email protected]>
Packager    : Docker Release (CE rpm) <[email protected]>
Packager    : Google Cloud Packages Automatic Signing Key <[email protected]>
Packager    : Google Cloud Packages Automatic Signing Key <[email protected]>
Packager    : Google Cloud Packages RPM Signing Key <[email protected]>

geerlingguy · 2020-05-07T17:41:04Z

I can't even rebuild the rpmdb:

[root@instance ~]# rm -f /var/lib/rpm/.*.lock
[root@instance ~]# rm -f /var/lib/rpm/__db.*
[root@instance ~]# rpm --rebuilddb
error: rpmdbNextIterator: skipping h#     173 blob size(4836): BAD, 8 + 16 * il(70) + dl(3708)
error: failed to replace old database with new database!
error: replace files in /var/lib/rpm with files from /var/lib/rpmrebuilddb.66870 to recover

geerlingguy · 2020-05-07T17:41:30Z

Possibly related? error: rpmdb: damaged header #173 retrieved -- skipping.

geerlingguy · 2020-05-07T17:46:17Z

No clue what's going on here, but also see jellyfin/jellyfin#2563

geerlingguy · 2020-05-07T17:49:01Z

Someone else also ran into the corrupt db issue: ansible/awx#6306

timmay75 · 2020-05-07T18:28:50Z

Hi Jeff. Thanks for the reply. I was trying to get the example you had out there (https://github.com/geerlingguy/ansible-for-devops/tree/master/kubernetes) working with vagrant and centos8. I seem to remember being able to get this going a work around but have forgotten it now. We ran into issues with the k8s internal flannel networking that was a core issue. So we ended up going another route with this. If you have any pointers on cent8 to get that working I would love to revisit it since another team member took it and wrote a new playbook from scratch and used calico and separated the roles out.

geerlingguy · 2020-05-07T21:25:00Z

It looks like the major issue might relate to using overlayfs—see Bug 1680124 - rpmdb --rebuilddb fails inside a container.

Basically, because I had a separate build layer in the Dockerfile that ran a yum -y update, and that seems to sometimes update packages like rpm, which trigger that create-tmp-then-rename bug, then the resulting image would fail the first time people tried doing yum/dnf/rpm activities. And the db couldn't be rebuilt since it was really corrupt.

So in geerlingguy/docker-centos8-ansible#7 I removed that separate yum -y update layer, and we'll see if that fixes things.

geerlingguy · 2020-05-07T23:10:01Z

Drat, I fixed the issue with yum and built-in GPG keys over in the issue linked above... but now we're back to:

    TASK [geerlingguy.kubernetes : Make cache if Kubernetes GPG key changed.] ******
505fatal: [instance]: FAILED! => {"changed": true, "cmd": ["yum", "-q", "makecache", "-y", "--disablerepo=*", "--enablerepo=kubernetes"], "delta": "0:00:00.509995", "end": "2020-05-07 23:08:22.993389", "msg": "non-zero return code", "rc": -13, "start": "2020-05-07 23:08:22.483394", "stderr": "Importing GPG key 0xA7317B0F:\n Userid     : \"Google Cloud Packages Automatic Signing Key <[email protected]>\"\n Fingerprint: D0BC 747F D8CA F711 7500 D6FA 3746 C208 A731 7B0F\n From       : https://packages.cloud.google.com/yum/doc/yum-key.gpg", "stderr_lines": ["Importing GPG key 0xA7317B0F:", " Userid     : \"Google Cloud Packages Automatic Signing Key <[email protected]>\"", " Fingerprint: D0BC 747F D8CA F711 7500 D6FA 3746 C208 A731 7B0F", " From       : https://packages.cloud.google.com/yum/doc/yum-key.gpg"], "stdout": "", "stdout_lines": []}

dmlb2000 · 2020-05-07T23:38:12Z

@geerlingguy So, is this issue being worked off of a different branch?

dmlb2000 · 2020-05-07T23:49:03Z

Oddly, the second run of a converge seems to get past the initial GPG key error. Is that the case for others?

dmlb2000 · 2020-05-08T00:48:17Z

Interesting related issue containers/podman#4431

Trying to change the task from a command to a shell and add the environment variable GPG_TTY=/dev/null

dmlb2000 · 2020-05-08T01:07:29Z

I'm wondering if the yum command need some sort of TTY input to open and close...

dmlb2000 · 2020-05-08T03:19:14Z

Interestingly enough I got the task to work by using expect. It's a tty input/output emulator through a scripting language.

- name: Install Expect
  package:
    name: ['expect']

- name: Make cache if Kubernetes GPG key changed.
  shell: |
    spawn yum -q makecache -y --disablerepo=* --enablerepo=kubernetes
    expect eof
    puts $expect_out(buffer)
    lassign [wait] pid spawnid os_error_flag value
    exit $value
  when: kubernetes_rpm_key is changed
  args:
    warn: false
    executable: /usr/bin/expect

The lassign statement captures the return information from the spawn process. The last return result value is the return code of the spawned process.

Don't get me wrong, this is a big jump to get the task to work properly. I think we shouldn't have to do this, there should be some other way that doesn't require a TTY.

dmlb2000 · 2020-05-08T15:55:50Z

Thought of another option last night, ignore_errors: true also seems to get past the issue. Though I'm not sure if that's inviting more problems later.

andybrook · 2020-05-16T22:03:44Z

In my testing it doesn't appear necessary to import the keys in the task "Add Kubernetes GPG keys", since they are already imported by the task "Ensure Kubernetes repository exists" thus the refresh of the cache in the task "Make cache if Kubernetes GPG key changed" doesn't happen, and doesn't error.

On freshly deployed VMs from the same template (template built and updated on the 9th of May).

Build 1
Ran a simple playbook copied from the readme letting it error, then reran the playbook successfully.

Build 2
Ran the same simple playbook copied from the readme letting it error, then ran the command in "Make cache if Kubernetes GPG key changed" manually, which resulted in output, (it gives no output when run a second time immediately after) then reran the playbook successfully.

Build 3
Deleted the task "Add Kubernetes GPG keys" from the role and ran the playbook successfully first go.

Comparing the output of the installed packages on the three builds they are identical, so there seems no reason to need to add the keys or update the cache on CentOS 8, could that task be made conditional on the ansible_distribution_major_version parameter not being 8?

Output from Build 1
[sysadmin@d-10-0-0-33 ~]$ dnf list installed | md5sum
c22a567f3e0f3567ddd1afb0f22f2e19 -

Output from Build 2
[sysadmin@d-10-0-0-34 ~]$ dnf list installed | md5sum
c22a567f3e0f3567ddd1afb0f22f2e19

Output from Build 3
[sysadmin@d-10-0-0-35 ~]$ dnf list installed | md5sum
c22a567f3e0f3567ddd1afb0f22f2e19

Very happy to provide further output or a PR if required, just didn't want to fill this issue with text!

santidhammo · 2020-05-29T12:08:45Z

I want to note that we are experiencing the same issue, and it does not have anything to do related to Kubernetes. Using The centos:8 docker image, Jenkins, a proxy and DNF causes exactly the same issue, it corrupts the database. This is indeed new (and wrong) behaviour.

Indeed, running dnf update and install sequences in a single layer resolves the issue, but still corrupts the database (therefore making any derived images impossible)

geerlingguy · 2020-07-08T15:41:51Z

Could this have been fixed this week upstream? I just got a passing test a few minutes ago...

geerlingguy added the bug Something isn't working label May 7, 2020

geerlingguy mentioned this issue May 7, 2020

Image builds failing with 'Error: GPG check FAILED' geerlingguy/docker-centos8-ansible#5

Closed

geerlingguy mentioned this issue May 7, 2020

Latest Docker Image Corrupted? geerlingguy/docker-centos8-ansible#7

Closed

geerlingguy added the planned label May 24, 2020

kotakanbe mentioned this issue Jan 18, 2021

fix(scan): add --nogpgcheck to dnf mod list to avoid Error: Cache-only enabled but no cache for * future-architect/vuls#1136

Merged

12 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CentOS8: Errors while making cache if GPG keys change #67

CentOS8: Errors while making cache if GPG keys change #67

pfuntner commented Feb 5, 2020

timmay75 commented Mar 5, 2020

geerlingguy commented May 7, 2020

geerlingguy commented May 7, 2020

geerlingguy commented May 7, 2020

geerlingguy commented May 7, 2020

geerlingguy commented May 7, 2020

geerlingguy commented May 7, 2020

geerlingguy commented May 7, 2020 •

edited

Loading

geerlingguy commented May 7, 2020

geerlingguy commented May 7, 2020

geerlingguy commented May 7, 2020

timmay75 commented May 7, 2020

geerlingguy commented May 7, 2020

geerlingguy commented May 7, 2020

dmlb2000 commented May 7, 2020

dmlb2000 commented May 7, 2020

dmlb2000 commented May 8, 2020

dmlb2000 commented May 8, 2020

dmlb2000 commented May 8, 2020 •

edited

Loading

dmlb2000 commented May 8, 2020

andybrook commented May 16, 2020

santidhammo commented May 29, 2020 •

edited

Loading

geerlingguy commented Jul 8, 2020

CentOS8: Errors while making cache if GPG keys change #67

CentOS8: Errors while making cache if GPG keys change #67

Comments

pfuntner commented Feb 5, 2020

timmay75 commented Mar 5, 2020

geerlingguy commented May 7, 2020

geerlingguy commented May 7, 2020

geerlingguy commented May 7, 2020

geerlingguy commented May 7, 2020

geerlingguy commented May 7, 2020

geerlingguy commented May 7, 2020

geerlingguy commented May 7, 2020 • edited Loading

geerlingguy commented May 7, 2020

geerlingguy commented May 7, 2020

geerlingguy commented May 7, 2020

timmay75 commented May 7, 2020

geerlingguy commented May 7, 2020

geerlingguy commented May 7, 2020

dmlb2000 commented May 7, 2020

dmlb2000 commented May 7, 2020

dmlb2000 commented May 8, 2020

dmlb2000 commented May 8, 2020

dmlb2000 commented May 8, 2020 • edited Loading

dmlb2000 commented May 8, 2020

andybrook commented May 16, 2020

santidhammo commented May 29, 2020 • edited Loading

geerlingguy commented Jul 8, 2020

geerlingguy commented May 7, 2020 •

edited

Loading

dmlb2000 commented May 8, 2020 •

edited

Loading

santidhammo commented May 29, 2020 •

edited

Loading