Skip to content

Commit c9e78b5

Browse files
committed
OADP-2410: Workaround for restic restore failure due to changed PSA policy
1 parent 4203f9b commit c9e78b5

File tree

5 files changed

+91
-3
lines changed

5 files changed

+91
-3
lines changed

backup_and_restore/application_backup_and_restore/backing_up_and_restoring/backing-up-applications.adoc

+10
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
[id="backing-up-applications"]
33
= Backing up applications
44
include::_attributes/common-attributes.adoc[]
5+
include::_attributes/attributes-openshift-dedicated.adoc[]
56
:context: backing-up-applications
67

78
toc::[]
@@ -31,6 +32,15 @@ You can schedule backups by creating a `Schedule` CR instead of a `Backup` CR. S
3132
include::modules/oadp-creating-backup-cr.adoc[leveloffset=+1]
3233
include::modules/oadp-backing-up-pvs-csi.adoc[leveloffset=+1]
3334
include::modules/oadp-backing-up-applications-restic.adoc[leveloffset=+1]
35+
36+
.Known issues
37+
38+
{ocp} 4.14 enforces a pod security admission (PSA) policy that can hinder the readiness of pods during a Restic restore process. 
39+
40+
This issue has been resolved in the OADP 1.1.6 and OADP 1.2.2 releases, therefore it is recommended that users upgrade to these releases.
41+
42+
For more information, see xref:../../../backup_and_restore/application_backup_and_restore/troubleshooting.adoc#oadp-restic-restore-failing-psa-policy_oadp-troubleshooting[Restic restore partially failing on OCP 4.14 due to changed PSA policy].
43+
3444
include::modules/oadp-using-data-mover-for-csi-snapshots.adoc[leveloffset=+1]
3545

3646
[id="oadp-12-data-mover-ceph"]

backup_and_restore/application_backup_and_restore/troubleshooting.adoc

+2
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
[id="troubleshooting"]
33
= Troubleshooting
44
include::_attributes/common-attributes.adoc[]
5+
include::_attributes/attributes-openshift-dedicated.adoc[]
56
:context: oadp-troubleshooting
67
:namespace: openshift-adp
78
:local-product: OADP
@@ -87,6 +88,7 @@ include::modules/oadp-item-restore-timeouts.adoc[leveloffset=+2]
8788
include::modules/oadp-item-backup-timeouts.adoc[leveloffset=+2]
8889
include::modules/oadp-backup-restore-cr-issues.adoc[leveloffset=+1]
8990
include::modules/oadp-restic-issues.adoc[leveloffset=+1]
91+
include::modules/oadp-restic-restore-failing-psa-policy.adoc[leveloffset=+2]
9092

9193
include::modules/migration-using-must-gather.adoc[leveloffset=+1]
9294
include::modules/oadp-monitoring.adoc[leveloffset=+1]

modules/oadp-backing-up-applications-restic.adoc

+2
Original file line numberDiff line numberDiff line change
@@ -39,3 +39,5 @@ spec:
3939
...
4040
----
4141
<1> Add `defaultVolumesToRestic: true` to the `spec` block.
42+
43+

modules/oadp-backup-restore-cr-issues.adoc

+3-3
Original file line numberDiff line numberDiff line change
@@ -55,12 +55,12 @@ You do not need to clean up the backup location because a `Backup` CR in progres
5555
[id="backup-cr-remains-partiallyfailed_{context}"]
5656
== Backup CR status remains in PartiallyFailed
5757

58-
The status of a `Backup` CR without Restic in use remains in the `PartiallyFailed` phase and does not complete. A snapshot of the affiliated PVC is not created.
58+
The status of a `Backup` CR without Restic in use remains in the `PartiallyFailed` phase and does not complete. A snapshot of the affiliated PVC is not created.
5959

6060
.Cause
6161

6262
If the backup is created based on the CSI snapshot class, but the label is missing, CSI snapshot plugin fails to create a snapshot. As a result, the `Velero` pod logs an error similar to the following:
63-
+
63+
6464
[source,text]
6565
----
6666
time="2023-02-17T16:33:13Z" level=error msg="Error backing up item" backup=openshift-adp/user1-backup-check5 error="error executing custom action (groupResource=persistentvolumeclaims, namespace=busy1, name=pvc1-user1): rpc error: code = Unknown desc = failed to get volumesnapshotclass for storageclass ocs-storagecluster-ceph-rbd: failed to get volumesnapshotclass for provisioner openshift-storage.rbd.csi.ceph.com, ensure that the desired volumesnapshot class has the velero.io/csi-volumesnapshot-class label" logSource="/remote-source/velero/app/pkg/backup/backup.go:417" name=busybox-79799557b5-vprq
@@ -84,4 +84,4 @@ $ oc delete backup <backup> -n openshift-adp
8484
$ oc label volumesnapshotclass/<snapclass_name> velero.io/csi-volumesnapshot-class=true
8585
----
8686

87-
. Create a new `Backup` CR.
87+
. Create a new `Backup` CR.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
:_content-type: PROCEDURE
2+
[id="oadp-restic-restore-failing-psa-policy_{context}"]
3+
= Restic restore partially failing on OCP 4.14 due to changed PSA policy
4+
5+
{ocp} 4.14 enforces a Pod Security Admission (PSA) policy that can hinder the readiness of pods during a Restic restore process. 
6+
7+
If a `SecurityContextConstraints` (SCC) resource is not found when a pod is created, and the PSA policy on the pod is not set up to meet the required standards, pod admission is denied. 
8+
9+
This issue arises due to the resource restore order of Velero.
10+
11+
.Sample error
12+
[source,text]
13+
----
14+
\"level=error\" in line#2273: time=\"2023-06-12T06:50:04Z\"
15+
level=error msg=\"error restoring mysql-869f9f44f6-tp5lv: pods\\\
16+
"mysql-869f9f44f6-tp5lv\\\" is forbidden: violates PodSecurity\\\
17+
"restricted:v1.24\\\": privil eged (container \\\"mysql\\\
18+
" must not set securityContext.privileged=true),
19+
allowPrivilegeEscalation != false (containers \\\
20+
"restic-wait\\\", \\\"mysql\\\" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (containers \\\
21+
"restic-wait\\\", \\\"mysql\\\" must set securityContext.capabilities.drop=[\\\"ALL\\\"]), seccompProfile (pod or containers \\\
22+
"restic-wait\\\", \\\"mysql\\\" must set securityContext.seccompProfile.type to \\\
23+
"RuntimeDefault\\\" or \\\"Localhost\\\")\" logSource=\"/remote-source/velero/app/pkg/restore/restore.go:1388\" restore=openshift-adp/todolist-backup-0780518c-08ed-11ee-805c-0a580a80e92c\n
24+
velero container contains \"level=error\" in line#2447: time=\"2023-06-12T06:50:05Z\"
25+
level=error msg=\"Namespace todolist-mariadb,
26+
resource restore error: error restoring pods/todolist-mariadb/mysql-869f9f44f6-tp5lv: pods \\\
27+
"mysql-869f9f44f6-tp5lv\\\" is forbidden: violates PodSecurity \\\"restricted:v1.24\\\": privileged (container \\\
28+
"mysql\\\" must not set securityContext.privileged=true),
29+
allowPrivilegeEscalation != false (containers \\\
30+
"restic-wait\\\",\\\"mysql\\\" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (containers \\\
31+
"restic-wait\\\", \\\"mysql\\\" must set securityContext.capabilities.drop=[\\\"ALL\\\"]), seccompProfile (pod or containers \\\
32+
"restic-wait\\\", \\\"mysql\\\" must set securityContext.seccompProfile.type to \\\
33+
"RuntimeDefault\\\" or \\\"Localhost\\\")\"
34+
logSource=\"/remote-source/velero/app/pkg/controller/restore_controller.go:510\"
35+
restore=openshift-adp/todolist-backup-0780518c-08ed-11ee-805c-0a580a80e92c\n]",
36+
----
37+
38+
.Solution
39+
40+
. In your DPA custom resource (CR), check or set the `restore-resource-priorities` field on the Velero server to ensure that `securitycontextconstraints` is listed in order before `pods` in the list of resources:
41+
+
42+
[source,terminal]
43+
----
44+
$ oc get dpa -o yaml
45+
----
46+
+
47+
.Example DPA CR
48+
[source,yaml]
49+
----
50+
# ...
51+
configuration:
52+
restic:
53+
enable: true
54+
velero:
55+
args:
56+
restore-resource-priorities: 'securitycontextconstraints,customresourcedefinitions,namespaces,storageclasses,volumesnapshotclass.snapshot.storage.k8s.io,volumesnapshotcontents.snapshot.storage.k8s.io,volumesnapshots.snapshot.storage.k8s.io,datauploads.velero.io,persistentvolumes,persistentvolumeclaims,serviceaccounts,secrets,configmaps,limitranges,pods,replicasets.apps,clusterclasses.cluster.x-k8s.io,endpoints,services,-,clusterbootstraps.run.tanzu.vmware.com,clusters.cluster.x-k8s.io,clusterresourcesets.addons.cluster.x-k8s.io' <1>
57+
defaultPlugins:
58+
- gcp
59+
- openshift
60+
----
61+
<1> If you have an existing restore resource priority list, ensure you combine that existing list with the complete list.
62+
63+
. Ensure that the security standards for the application pods are aligned, as provided in link:https://access.redhat.com/solutions/7002730[Fixing PodSecurity Admission warnings for deployments], to prevent deployment warnings. If the application is not aligned with security standards, an error can occur regardless of the SCC. 
64+
65+
[NOTE]
66+
====
67+
This solution is temporary, and ongoing discussions are in progress to address it. 
68+
====
69+
70+
71+
[role="_additional-resources"]
72+
.Additional resources
73+
74+
* link:https://access.redhat.com/solutions/7002730[Fixing PodSecurity Admission warnings for deployments]

0 commit comments

Comments
 (0)