-
Notifications
You must be signed in to change notification settings - Fork 68
/
Copy pathbbr-restore.html.md.erb
703 lines (526 loc) · 32.4 KB
/
bbr-restore.html.md.erb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
---
title: Restoring Tanzu Kubernetes Grid Integrated Edition
owner: TKGI
---
This topic describes how to use BOSH Backup and Restore (BBR) to restore
the BOSH Director, VMware Tanzu Kubernetes Grid Integrated Edition (TKGI) control plane, and Kubernetes clusters.
##<a id="overview"></a> Overview
In the event of a disaster, you might lose your environment's VMs, disks, and your IaaS network and load balancer resources as well.
You can re-create your environment, configured with your saved Tanzu Kubernetes Grid Integrated Edition Ops Manager Installation settings,
using your BBR backup artifacts.
Before restoring using BBR:
* Review the requirements listed in [Compatibility of Restore](#compatibility)
below.
* Complete all of the steps documented in
[Preparing to Restore a Backup](#preparing-backup) below.
<br>
Use BBR to restore the following:
* The BOSH Director plane, see
[Restore the BOSH Director](#redeploy-restore-director) below.
* The Tanzu Kubernetes Grid Integrated Edition control plane, see
[Restore Tanzu Kubernetes Grid Integrated Edition Control Plane](#redeploy-restore-control-plane)
below.
* The Tanzu Kubernetes Grid Integrated Edition clusters, see
[Restore Tanzu Kubernetes Grid Integrated Edition Clusters](#redeploy-restore-clusters)
below.
## <a id="compatibility"></a> Compatibility of Restore
The following are the requirements for a backup artifact to be restorable to another environment:
* **Topology**: BBR requires the BOSH topology of a deployment to be the same in the restore
environment as it was in the backup environment.
* **Naming of instance groups and jobs**: For any deployment that implements the back up and restore
scripts, the instance groups and jobs must have the same names.
* **Number of instance groups and jobs**: For instance groups and jobs that have back up and restore scripts, the same number of instances must exist.
Additional considerations:
* **Limited validation**: BBR puts the backed up data into the corresponding instance groups and
jobs in the restored environment, but cannot validate the restore beyond that.
* **Same Cluster**: Currently, BBR supports the in-place restore of a cluster backup artifact onto the same cluster.
Migration from one cluster to another using a BBR backup artifact has not yet been validated.
<p class="note"><strong>Note:</strong> This section is for guidance only. Always validate your
backups by using the backup artifacts in a restore.
</p>
## <a id="prepare"></a> Prepare to Restore a Backup
<%= partial 'preparing-for-bbr' %>
<p>
</p>
## <a id="artifacts-jumpbox"></a> Transfer Artifacts to Your Jump Box
To restore BOSH director, Tanzu Kubernetes Grid Integrated Edition control plane or cluster you must transfer your BBR backup artifacts from your safe storage location to your jump box.
1. To copy an artifact onto a jump box, run the following SCP command:
```
scp -r LOCAL-PATH-TO-BACKUP-ARTIFACT JUMP-BOX-USER@JUMP-BOX-ADDRESS:
```
Where:
* `LOCAL-PATH-TO-BACKUP-ARTIFACT` is the path to your BBR backup artifact.
* `JUMP-BOX-USER` is the SSH user name of the jump box.
* `JUMP-BOX-ADDRESS` is the IP address, or hostname, of the jump box.
1. (Optional) Decrypt your backup artifact if the artifact is encrypted.
## <a id="redeploy-restore-director"></a> Restore the BOSH Director
In the event of losing your BOSH Director or Ops Manager environment, you must first recreate the BOSH Director VM
before restoring the BOSH Director.
You can restore your BOSH Director configuration by using Tanzu Kubernetes Grid Integrated Edition
Ops Manager to restore the installation settings artifacts saved when following the [Export Installation Settings](bbr-backup.html#export-opsman-settings) back up procedure steps.
To redeploy and restore your Ops Manager and BOSH Director follow the procedures below.
### <a id='deploy-opsmanager'></a> Deploy Ops Manager
In the event of a disaster, you might lose your IaaS resources. You must recreate your IaaS resources before restoring using your BBR artifacts.
1. To recreate your IaaS resources, such as networks and load balancers, prepare your
environment for Tanzu Kubernetes Grid Integrated Edition by following the installation instructions
specific to your IaaS in [Installing Tanzu Kubernetes Grid Integrated Edition](installing.html).
1. After recreating IaaS resources, you must add those resources to Ops Manager
by performing the procedures in the [(Optional) Configure Ops Manager for New Resources](#config-new-resources) section.
### <a id='import-settings'></a>Import Installation Settings
<p class="note warning">
<strong>WARNING:</strong> After importing installation settings, do not click <strong>Apply Changes</strong>
in Ops Manager before instructed to in the steps <a href="#deploy-bosh-director">Deploy the BOSH Director</a> or
<a href="#redeploy-restore-control-plane">Redeploy the Tanzu Kubernetes Grid Integrated Edition
Control Plane</a>.
</p>
You can import installation settings in two ways:
* Use the Ops Manager UI:
1. Access your new Ops Manager by navigating to `YOUR-OPS-MAN-FQDN` in a browser.
1. On the **Welcome to Ops Manager** page, click **Import Existing Installation**.
1. In the import panel, perform the following tasks:
* Enter the **Decryption Passphrase** in use when you exported the installation settings from Ops Manager.
* Click **Choose File** and browse to the installation zip file that you exported in [Back Up Installation Settings](bbr-backup.html#export-opsman-settings).
1. Click **Import**.
<p class="note">
<strong>Note:</strong> Some browsers do not provide the import process progress status, and might appear to hang.
The import process takes at least 10 minutes, and requires additional time for each restored Ops Manager tile.
</p>
1. **Successfully imported installation** is displayed upon successful completion of importing all installation settings.
* Use the Ops Manager API:
1. To use the Ops Manager API to import installation settings, run the following command:
```
curl "https://OPS-MAN-FQDN/api/v0/installation_asset_collection" \
-X POST \
-H "Authorization: Bearer UAA-ACCESS-TOKEN" \
-F 'installation[file][email protected]' \
-F 'passphrase=DECRYPTION-PASSPHRASE'
```
Where:
* `OPS-MAN-FQDN` is the fully-qualified domain name (FQDN) for your Ops Manager deployment.
* `UAA-ACCESS-TOKEN` is the UAA access token. For more information about how to retrieve this token,
see [Using the Ops Manager API](https://techdocs.broadcom.com/us/en/vmware-tanzu/platform/tanzu-operations-manager/3-0/tanzu-ops-manager/install-ops-man-api.html).
* `DECRYPTION-PASSPHRASE` is the decryption passphrase in use when you exported the installation
settings from Ops Manager.
### <a id="config-new-resources"></a> (Optional) Configure Ops Manager for New Resources
If you recreated IaaS resources such as networks and load balancers by following the steps in the
[Deploy Ops Manager](#deploy-opsmanager) section above, perform the following steps to update Ops Manager with your new resources:
1. Activate Ops Manager advanced mode. For more information, see
[How to Enable Advanced Mode in the Ops Manager](https://knowledge.broadcom.com/external/article/293516/)
in the Knowledge Base.
<p class="note">
<strong>Note:</strong> Ops Manager advanced mode allows you to make changes that are normally deactivated.
You might see warning messages when you save changes.
</p>
1. Navigate to the Ops Manager Installation Dashboard and click the BOSH Director tile.
1. Click **Create Networks** and update the network names to reflect the network names for the new environment.
1. If your BOSH Director had an external hostname, you must change it in **Director Config > Director Hostname**
to ensure it does not conflict with the hostname of the backed up Director.
1. Ensure that there are no outstanding warning messages in the BOSH Director tile, then deactivate Ops Manager advanced mode.
For more information, see [How to Enable Advanced Mode in the Ops Manager](https://knowledge.broadcom.com/external/article/293516/)
in the Knowledge Base.
<p class="note">
<strong>Note</strong>: A change in VM size or underlying hardware will not affect the ability for BBR
restore data, as long as adequate storage space to restore the data exists.
</p>
### <a id="bosh-state"></a> Remove BOSH State File
1. SSH into your Ops Manager VM. For more information, see the
[Log in to the Ops Manager VM with SSH](https://techdocs.broadcom.com/us/en/vmware-tanzu/platform/tanzu-operations-manager/3-0/tanzu-ops-manager/install-trouble-advanced.html#ssh)
section of the _Advanced Troubleshooting with the BOSH CLI_ topic.
1. To delete the `/var/tempest/workspaces/default/deployments/bosh-state.json` file, run the following on the Ops Manager VM:
```bash
sudo rm /var/tempest/workspaces/default/deployments/bosh-state.json
```
1. In a browser, navigate to your Ops Manager's fully-qualified domain name.
1. Log in to Ops Manager.
### <a id="deploy-bosh-director"></a> Deploy the BOSH Director
You can deploy the BOSH Director by itself in two ways:
* Use the Ops Manager UI:
1. Open the Ops Manager Installation Dashboard.
1. Click **Review Pending Changes**.
1. On the Review Pending Changes page, click the **BOSH Director** check box.
1. Click **Apply Changes**.
* Use the Ops Manager API:
1. Use the Ops Manager API to deploy the BOSH Director.
### <a id='restore-director'></a> Restore the BOSH Director
Restore the BOSH Director by running BBR commands on your jump box.
To restore the BOSH Director:
1. Ensure the Tanzu Kubernetes Grid Integrated Edition BOSH Director backup artifact is in the folder from which you run BBR.
1. Run the BBR restore command to restore the TKGI BOSH Director:
```
nohup bbr director --host BOSH-DIRECTOR-IP \
--username bbr --private-key-path PRIVATE-KEY-FILE \
restore \
--artifact-path PATH-TO-DIRECTOR-BACKUP
```
Where:
* `BOSH-DIRECTOR-IP` is the address of the BOSH Director. If the BOSH Director is public, BOSH-DIRECTOR-IP is a URL,
such as `https://my-bosh.xxx.cf-app.com`. Otherwise, this is the internal IP `BOSH-DIRECTOR-IP` which you can
retrieve as shown in [Retrieve the BOSH Director Address](#bosh-address).
* `PRIVATE-KEY-FILE` is the path to the private key file that you can create from `Bbr Ssh Credentials` as shown in
[Download the BBR SSH Credentials](#bbr-ssh-creds).
* `PATH-TO-DEPLOYMENT-BACKUP` is the path to the TKGI BOSH Director backup that you want to restore.
For example:
```console
$ nohup bbr director --host 10.0.0.5 \
--username bbr --private-key-path private.pem \
restore \
--artifact-path /home/10.0.0.5-abcd1234abcd1234
```
<p class="note">
<strong>Note</strong>: The BBR restore command can take a long time to complete.
The example command in this section uses <code>nohup</code> and the restore process is run within your SSH session.
If you instead run the BBR command in a <code>screen</code> or <code>tmux</code> session the task will
run separately from your SSH session and will continue to run, even if your SSH connection to the jump box fails.
</p>
1. If your BOSH Director restore fails, do one or more of the following:
* Run the command again, adding the `--debug` flag to activate debug logs. For more information,
see [BBR Logging](bbr-logging.html).
* Follow the steps in [Resolve a Failing BBR Restore Command](#recover-from-failing-command) below.
Be sure to complete the steps in [Clean Up After a Failed Restore](#manual-clean) below.
### <a id='remove-stale-cloud-ids'></a> Remove All Stale Deployment Cloud IDs
After BOSH Director has been restored, you must reconcile BOSH Director's internal state with the state of the IaaS.
1. To determine the existing deployments in your environment, run the following command:
```
BOSH-CLI-CREDENTIALS bosh deployments
```
Where:
* `BOSH-CLI-CREDENTIALS` is the full `Bosh Commandline Credentials` value that you copied from the BOSH Director tile in [Download the BOSH Commandline Credentials](#bosh-cli-creds).
1. To reconcile the BOSH Director's internal state with the state of a single deployment, run the following command:
```
BOSH-CLI-CREDENTIALS bosh -d DEPLOYMENT-NAME -n cck \
--resolution delete_disk_reference \
--resolution delete_vm_reference
```
Where:
* `BOSH-CLI-CREDENTIALS` is the full `Bosh Commandline Credentials` value that you copied from the BOSH Director tile in [Download the BOSH Commandline Credentials](#bosh-cli-creds).
* `DEPLOYMENT-NAME` is a deployment name retrieved in the previous step.
1. Repeat the last command for each deployment in the IaaS.
## <a id='redeploy-restore-control-plane'></a> Restore the Tanzu Kubernetes Grid Integrated Edition Control Plane
You must redeploy the Tanzu Kubernetes Grid Integrated Edition tile before restoring the Tanzu Kubernetes Grid Integrated Edition control plane.
By redeploying the Tanzu Kubernetes Grid Integrated Edition tile you create the VMs that constitute the control plane deployment.
To redeploy the Tanzu Kubernetes Grid Integrated Edition tile, do the following:
* [Determine the Required Stemcell](#determine-stemcell) needed by the tile.
* Upload that stemcell as described in [Upload Stemcells](#upload-stemcell).
* [Redeploy the Tanzu Kubernetes Grid Integrated Edition Control Plane](#redeploy-control-plane).
* [Restore the TKGI Control Plane](#restore-control-plane) from a BBR backup on top of the deployment.
### <a id='determine-stemcell'></a> Determine the Required Stemcell
Do either the following procedures to determine the stemcell that TKGI uses:
* Review the Stemcell Library:
1. Open Ops Manager.
1. Click **Stemcell Library**.
1. Record the TKGI stemcell release number from the **Staged** column.
* Review a Stemcell List Using BOSH CLI:
1. To retrieve the stemcell release using the BOSH CLI, run the following command:
```
BOSH-CLI-CREDENTIALS bosh deployments
```
Where:
* `BOSH-CLI-CREDENTIALS` is the full `Bosh Commandline Credentials` value that you copied from the BOSH Director tile in [Download the BOSH Commandline Credentials](#bosh-cli-creds).
For example:
```console
$ bosh deployments
Using environment '10.0.0.5' as user 'director' (bosh.*.read, openid, bosh.*.admin, bosh.read, bosh.admin)
Name Release(s) Stemcell(s) Team(s)
pivotal-container-service-453f2faa3bd2e16f52b7 backup-and-restore-sdk/1.9.0 bosh-google-kvm-ubuntu-jammy-go_agent/1.75 -
...
```
<p class="note"><strong>Note:</strong> At most, the TKGI tile can have two
stemcells, where one stemcell is Linux and the other stemcell is Windows.
</p>
For more information about stemcells in Ops Manager, see [Importing and Managing Stemcells](https://techdocs.broadcom.com/us/en/vmware-tanzu/platform/tanzu-operations-manager/3-0/tanzu-ops-manager/opsguide-managing-stemcells.html).
### <a id='upload-stemcell'></a> Upload Stemcells
To upload the stemcell used by your Tanzu Kubernetes Grid Integrated Edition tile:
1. Download the stemcell from [Broadcom Support](https://support.broadcom.com/group/ecx/productdownloads?subfamily=Stemcells%20(Ubuntu%20Xenial)).
1. Run the following command to upload the stemcell used by TKGI:
```
BOSH-CLI-CREDENTIALS bosh -d DEPLOYMENT-NAME \
--ca-cert PATH-TO-BOSH-SERVER-CERTIFICATE \
upload-stemcell \
--fix PATH-TO-STEMCELL
```
Where:
* `BOSH-CLI-CREDENTIALS` is the full `Bosh Commandline Credentials` value that you copied
from the BOSH Director tile in [Download the BOSH Commandline Credentials](#bosh-cli-creds).
* `PATH-TO-BOSH-SERVER-CERTIFICATE` is the path to the root CA certificate that you
downloaded in [Download the Root CA Certificate](#root-ca-cert).
* `PATH-TO-STEMCELL` is the path to your tile's stemcell.
1. To ensure the stemcells for all of your other installed tiles have been uploaded,
repeat the last step, running the `bosh upload-stemcell --fix PATH-TO-STEMCELL` command,
for each required stemcell that is different from the already uploaded TKGI stemcell.
### <a id='redeploy-control-plane'></a> Redeploy the Tanzu Kubernetes Grid Integrated Edition Control Plane
To redeploy your Tanzu Kubernetes Grid Integrated Edition tile's control plane:
1. From the Ops Manager Installation Dashboard, navigate to **VMware Tanzu Kubernetes Grid Integrated Edition** > **Resource Config**.
1. Ensure the **Upgrade all clusters** errand is **Off**.
1. Ensure both **Instances** > **TKGI API** and
**Instances** > **TKGI Database** are configured as they had been
when the backup you are restoring was created.
1. Ensure that all errands needed by your system are set to run.
1. Return to the Ops Manager Installation Dashboard.
1. Click **Review Pending Changes**.
1. Review your changes. For more information, see [Reviewing Pending Product Changes](https://techdocs.broadcom.com/us/en/vmware-tanzu/platform/tanzu-operations-manager/3-0/tanzu-ops-manager/install-review-pending-changes.html).
1. Click **Apply Changes** to redeploy the control plane.
### <a id='restore-control-plane'></a> Restore the TKGI Control Plane
Restore the Tanzu Kubernetes Grid Integrated Edition control plane by running BBR commands on your jump box.
To restore the Tanzu Kubernetes Grid Integrated Edition control plane:
1. Ensure the Tanzu Kubernetes Grid Integrated Edition deployment backup artifact is in the folder from which you run BBR.
1. Run the BBR restore command to restore the TKGI control plane:
```
BOSH_CLIENT_SECRET=BOSH-CLIENT-SECRET \
nohup bbr deployment --target BOSH-TARGET \
--username BOSH-CLIENT --deployment DEPLOYMENT-NAME \
--ca-cert PATH-TO-BOSH-SERVER-CERT \
restore \
--artifact-path PATH-TO-DEPLOYMENT-BACKUP
```
Where:
* `BOSH-CLIENT-SECRET` is the value for `BOSH_CLIENT_SECRET` retrieved in
[Download the BOSH Commandline Credentials](#bosh-cli-creds).
* `BOSH-TARGET` is the value for `BOSH_ENVIRONMENT` retrieved in
[Download the BOSH Commandline Credentials](#bosh-cli-creds).
You must be able to reach the target address from the workstation where you run `bbr` commands.
* `BOSH-CLIENT` is the value for `BOSH_CLIENT` retrieved in
[Download the BOSH Commandline Credentials](#bosh-cli-creds).
* `DEPLOYMENT-NAME` is the deployment name retrieved in
[Locate the Tanzu Kubernetes Grid Integrated Edition Deployment Name](#locate-deploy-name).
* `PATH-TO-BOSH-CA-CERT` is the path to the root CA certificate that you downloaded in
[Download the Root CA Certificate](#root-ca-cert).
* `PATH-TO-DEPLOYMENT-BACKUP` is the path to the TKGI control plane backup that you want to restore.
For example:
```console
$ BOSH_CLIENT_SECRET=p455w0rd \
nohup bbr deployment --target bosh.example.com \
--username admin --deployment pivotal-container-0 \
--ca-cert bosh.ca.crt \
restore \
--artifact-path /home/pivotal-container-service_abcd1234abcd1234abcd-abcd1234abcd1234
```
<p class="note">
<strong>Note</strong>: The BBR restore command can take a long time to complete.
The command above uses <code>nohup</code> and the restore process is run within your SSH session.
If you instead run the BBR command in a <code>screen</code> or <code>tmux</code> session the task will
run separately from your SSH session and will continue to run, even if your SSH connection to the jump box fails.
</p>
1. If your Tanzu Kubernetes Grid Integrated Edition control plane restore fails, do one or more of the following:
* Run the command again, adding the `--debug` flag to activate debug logs. For more information,
see [BBR Logging](bbr-logging.html).
* Follow the steps in [Resolve a Failing BBR Restore Command](#recover-from-failing-command) below.
Be sure to complete the steps in [Clean Up After a Failed Restore](#manual-clean) below.
## <a id='redeploy-restore-clusters'></a> Redeploy and Restore Clusters
After restoring the Tanzu Kubernetes Grid Integrated Edition control plane,
perform the following steps to redeploy the TKGI-provisioned Kubernetes clusters
and restore their state from backup.
### <a id='redeploy-clusters'></a> Redeploy Clusters
Before restoring your TKGI-provisioned clusters, you must redeploy them to BOSH.
To redeploy TKGI-provisioned clusters:
* If you want to redeploy all clusters simultaneously,
see [Redeploy All Clusters](#redeploy-all-clusters).
* If you want to redeploy one cluster at a time,
see [Redeploy a Single Cluster](#redeploy-single-cluster).
#### <a id='redeploy-all-clusters'></a> Redeploy All Clusters
To redeploy all clusters:
1. In Ops Manager, navigate to the **Tanzu Kubernetes Grid Integrated Edition** tile.
1. Click **Errands**.
1. Ensure the **Upgrade all clusters** errand is **On**.
This errand redeploys all your TKGI-provisioned clusters.
1. Return to the **Installation Dashboard**.
1. Click **Review Pending Changes**, review your changes, and then click **Apply Changes**.
For more information, see [Reviewing Pending Product Changes](https://techdocs.broadcom.com/us/en/vmware-tanzu/platform/tanzu-operations-manager/3-0/tanzu-ops-manager/install-review-pending-changes.html).
#### <a id='redeploy-single-cluster'></a> Redeploy a Single Cluster
To redeploy a TKGI-provisioned cluster through the TKGI CLI:
1. Identify the names of your TKGI-provisioned clusters:
```
tkgi clusters
```
1. For each cluster you want to redeploy, run the following command:
```
tkgi upgrade-cluster CLUSTER-NAME
```
Where `CLUSTER-NAME` is the name of your Kubernetes cluster.
For more information, see [Upgrade Clusters](upgrade-clusters.html#upgrade-clusters).
### <a id='restore-clusters'></a> Restore Clusters
After redeploying your TKGI-provisioned clusters, restore their stateless workloads and cluster state from backup by running the BOSH `restore` command from your jump box.
Stateless workloads are tracked in the cluster etcd database, which BBR backs up.
<p class="note warning">
<strong>Warning:</strong> BBR does not back up persistent volumes, load balancers, or other IaaS resources.</p>
<p class="note warning">
<strong>Warning:</strong> When you restore a cluster, etcd is stopped in the API server.
During this process, only currently-deployed clusters function, and you cannot create new workloads.
</p>
To restore a cluster:
1. Move the cluster backup artifact to a folder from which you will run the BBR restore process.
1. SSH into your jump box. For more information about the jump box, see
[Configure Your Jump Box](bbr-install.html#jumpbox-setup) in _Installing BOSH Backup and Restore_.
1. Run the following command:
```
BOSH_CLIENT_SECRET=BOSH-CLIENT-SECRET \
nohup bbr deployment --target BOSH-TARGET \
--username BOSH-CLIENT --deployment DEPLOYMENT-NAME \
--ca-cert PATH-TO-BOSH-SERVER-CERT \
restore \
--artifact-path PATH-TO-DEPLOYMENT-BACKUP
```
Where:
* `BOSH-CLIENT-SECRET` is the `BOSH_CLIENT_SECRET` property. This value is in the BOSH Director tile under **Credentials > Bosh Commandline Credentials**.
* `BOSH-TARGET` is the `BOSH_ENVIRONMENT` property. This value is in the BOSH Director tile under **Credentials > Bosh Commandline Credentials**.
You must be able to reach the target address from the workstation where you run `bbr` commands.
* `BOSH-CLIENT` is the `BOSH_CLIENT` property. This value is in the BOSH Director tile under **Credentials > Bosh Commandline Credentials**.
* `DEPLOYMENT-NAME` is the cluster BOSH deployment name that you recorded in
[Retrieve Your Cluster Deployment Names](#cluster-deployment-name) above.
* `PATH-TO-BOSH-CA-CERT` is the path to the root CA certificate that you downloaded in the [Download the Root CA Certificate](#root-ca-cert) section above.
* `PATH-TO-DEPLOYMENT-BACKUP` is the path to to your deployment backup.
Make sure you have transfer your artifact into your jump box as described in [Transfer Artifacts to Jump Box](#artifacts-jumpbox) above.
For example:
```console
$ BOSH_CLIENT_SECRET=p455w0rd \
nohup bbr deployment \
--target bosh.example.com \
--username admin \
--deployment service-instance_3839394 \
--ca-cert bosh.ca.cert \
restore \
--artifact-path deployment-backup
```
<p class="note">
<strong>Note</strong>: The BBR restore command can take a long time to complete.
The BBR restore command above uses <code>nohup</code> and the restore process is run within your SSH session.
If you instead run the BBR command in a <code>screen</code> or <code>tmux</code> session the task will
run separately from your SSH session and will continue to run, even if your SSH connection to the jump box fails.
</p>
1. To cancel a running `bbr restore`, see [Cancel a Restore](#cancel-restore) below.
1. After you restore a Kubernetes cluster, you must register its workers with their control plane nodes by following the [Register Restored Worker VMs](#register-nodes) steps below.
1. If your Tanzu Kubernetes Grid Integrated Edition cluster restore fails, do one or more of the following:
* Run the command again, adding the `--debug` flag to activate debug logs. For more information,
see [BBR Logging](bbr-logging.html).
* Follow the steps in [Resolve a Failing BBR Restore Command](#recover-from-failing-command) below.
Be sure to complete the steps in [Clean Up After a Failed Restore](#manual-clean) below.
## <a id='register-nodes'></a> Register Restored Worker VMs
After restoring a Kubernetes cluster,
you must register all of the cluster's worker nodes with their control plane nodes.
To register cluster worker nodes, complete the following:
1. [Delete Nodes](#delete-nodes)
1. [Restart kubelet](#restart-kubelet)
### <a id='delete-nodes'></a> Delete Nodes
To delete a cluster's restored nodes:
1. To determine your cluster's namespace, run the following command:
```
kubectl get all --all-namespaces
```
1. To retrieve the list of worker nodes in the cluster, run the following command:
```
kubectl get nodes -o wide
```
Document the worker node names listed in the `NAME` column.
Verify the worker nodes are all listed with a status of `NotReady`.
1. To delete a node, run the following:
```
kubectl delete node NODE-NAME
```
Where `NODE-NAME` is a node `NAME` returned by the `kubectl get nodes` command.
1. Repeat the preceding `kubectl delete node` step for each of your cluster's nodes.
### <a id='restart-kubelet'></a> Restart kubelet
To restart `kubelet` on your worker node VMs:
1. To restart `kubelet` on all of your cluster's worker node VMs, run the following command:
```
bosh ssh -d DEPLOYMENT-NAME worker -c 'sudo /var/vcap/bosh/bin/monit restart kubelet'
```
Where `DEPLOYMENT-NAME` is the cluster BOSH deployment name that you recorded in
[Retrieve Your Cluster Deployment Names](#cluster-deployment-name) above.
1. To confirm all worker nodes in your cluster have been restored to a `Ready` state,
run the following command:
```
kubectl get nodes -o wide
```
## <a id="recover-from-failing-command"></a>Resolve a Failing BBR Restore Command
To resolve a failing BBR restore command:
1. Ensure that you set all the parameters in the command.
1. Ensure that the BOSH Director credentials are valid.
1. Ensure that the specified BOSH deployment or Director exists.
1. Ensure that the jump box can reach the BOSH Director.
1. Ensure the source backup artifact is compatible with the target BOSH deployment or Director.
1. If you see the error message `Directory /var/vcap/store/bbr-backup already exists on instance`,
run the relevant commands from the [Clean up After Failed Restore](#manual-clean) section of this
topic.
1. See the [BBR Logging](bbr-logging.html) topic.
## <a id='cancel-restore'></a>Cancel a Restore
If you must cancel a restore, perform the following steps:
1. Terminate the BBR process by pressing Ctrl-C and typing `yes` to confirm.
1. Perform the procedures in the [Clean up After Failed Restore](#manual-clean) section to support future restores. Stopping a restore can leave the system in an unusable state and prevent future restores.
## <a id="manual-clean"></a>Clean Up After a Failed Restore
If a BBR restore process fails, BBR might not have run the post-restore scripts, potentially leaving the instance in a locked state.
Additionally, the BBR restore folder might remain on the target instance and subsequent restore attempts might also fail.
* To resolve issues following a failed BOSH Director restore, run the following BBR command:
```
nohup bbr director \
--host BOSH-DIRECTOR-IP \
--username bbr \
--private-key-path PRIVATE-KEY-FILE \
restore-cleanup
```
Where:
* `BOSH-DIRECTOR-IP` is the address of the BOSH Director. If the BOSH Director is public,
BOSH-DIRECTOR-IP is a URL, such as `https://my-bosh.xxx.cf-app.com`. Otherwise, this is the internal IP `BOSH-DIRECTOR-IP`
which you can retrieve as show in [Retrieve the BOSH Director Address](#bosh-address) above.
* `PRIVATE-KEY-FILE` is the path to the private key file that you can create from `Bbr Ssh Credentials`
as shown in [Download the BBR SSH Credentials](#bbr-ssh-creds) above.
For example:
```console
$ nohup bbr director \
--target 10.0.0.5 \
--username bbr \
--private-key-path private.pem \
restore-cleanup
```
* To resolve issues following a failed control plane restore, run the following BBR command:
```
BOSH_CLIENT_SECRET=BOSH-CLIENT-SECRET \
bbr deployment \
--target BOSH-TARGET \
--username BOSH-CLIENT \
--deployment DEPLOYMENT-NAME \
--ca-cert PATH-TO-BOSH-CA-CERT \
restore-cleanup
```
Where:
* `BOSH-CLIENT-SECRET` is the value for `BOSH_CLIENT_SECRET` retrieved in [Download the BOSH Commandline Credentials](#bosh-cli-creds) above.
* `BOSH-TARGET` is the value for `BOSH_ENVIRONMENT` retrieved in [Download the BOSH Commandline Credentials](#bosh-cli-creds) above.
You must be able to reach the target address from the workstation where you run `bbr` commands.
* `BOSH-CLIENT` is the value for `BOSH_CLIENT` retrieved in [Download the BOSH Commandline Credentials](#bosh-cli-creds) above.
* `DEPLOYMENT-NAME` is the name retrieved in [Retrieve Your Cluster Deployment Name](#cluster-deployment-name) above.
* `PATH-TO-BOSH-CA-CERT` is the path to the root CA certificate that you downloaded in [Download the Root CA Certificate](#root-ca-cert) above.
For example:
```console
$ BOSH_CLIENT_SECRET=p455w0rd \
bbr deployment \
--target bosh.example.com \
--username admin \
--deployment pivotal-container-service-453f2f \
--ca-cert bosh.ca.crt \
restore-cleanup
```
* To resolve issues following a failed cluster restore, run the following BBR command:
```
BOSH_CLIENT_SECRET=BOSH-CLIENT-SECRET \
bbr deployment \
--target BOSH-TARGET \
--username BOSH-CLIENT \
--deployment DEPLOYMENT-NAME \
--ca-cert PATH-TO-BOSH-CA-CERT \
restore-cleanup
```
Where:
* `BOSH-CLIENT-SECRET` is the value for `BOSH_CLIENT_SECRET` retrieved in [Download the BOSH Commandline Credentials](#bosh-cli-creds).
* `BOSH-TARGET` is the value for `BOSH_ENVIRONMENT` retrieved in [Download the BOSH Commandline Credentials](#bosh-cli-creds).
You must be able to reach the target address from the workstation where you run `bbr` commands.
* `BOSH-CLIENT` is the value for `BOSH_CLIENT` retrieved in [Download the BOSH Commandline Credentials](#bosh-cli-creds).
* `DEPLOYMENT-NAME` is the name retrieved in [Retrieve Your Cluster Deployment Names](#cluster-deployment-name) above.
* `PATH-TO-BOSH-CA-CERT` is the path to the root CA certificate that you downloaded in [Download the Root CA Certificate](#root-ca-cert).
For example:
```console
$ BOSH_CLIENT_SECRET=p455w0rd \
bbr deployment \
--target bosh.example.com \
--username admin \
--deployment pivotal-container-service-453f2f \
--ca-cert bosh.ca.crt \
restore-cleanup
```