RBD 镜像

RBD 映像可在两个 Ceph 集群间异步地镜像。此能力有两种模式可用:

  • Journal-based: This mode uses the RBD journaling image feature to ensure point-in-time, crash-consistent replication between clusters. Every write to the RBD image is first recorded to the associated journal before modifying the actual image. The remote cluster will read from this associated journal and replay the updates to its local copy of the image. Since each write to the RBD image will result in two writes to the Ceph cluster, expect write latencies to nearly double while using the RBD journaling image feature.
  • Snapshot-based: This mode uses periodically scheduled or manually created RBD image mirror-snapshots to replicate crash-consistent RBD images between clusters. The remote cluster will determine any data or metadata updates between two mirror-snapshots and copy the deltas to its local copy of the image. With the help of the RBD fast-diff image feature, updated data blocks can be quickly determined without the need to scan the full RBD image. Since this mode is not as fine-grained as journaling, the complete delta between two snapshots will need to be synced prior to use during a failover scenario. Any partially applied set of deltas will be rolled back at moment of failover.


journal-based mirroring requires the Ceph Jewel release or later; snapshot-based mirroring requires the Ceph Octopus release or later.

Mirroring is configured on a per-pool basis within peer clusters and can be configured on a specific subset of images within the pool. You can also mirror all images within a given pool when using journal-based mirroring. Mirroring is configured using the rbd command. The rbd-mirror daemon is responsible for pulling image updates from the remote peer cluster and applying them to the image within the local cluster.

Depending on the desired needs for replication, RBD mirroring can be configured for either one- or two-way replication:

  • 单向复制( One-way Replication ): 数据只是从主集群镜像到次集群时, rbd-mirror 守护进程只需在次集群上运行。
  • 双向复制( Two-way Replication ): 数据从一个集群的主映像镜像到另一个集群的非主映像时(反之亦然),两个集群上都得运行 rbd-mirror 守护进程。


每一个 rbd-mirror 都必须能够同时连接本地和远端 Ceph 集群 (即所有监视器和 OSD 主机)。另外,两个数据中心之间的网络必须有足够的带宽, 以处理镜像流量。


以下步骤演示了如何用 rbd 命令执行基本的管理任务,并配置镜像。镜像是以 Ceph 集群的存储池为单位配置的。

以下的存储池配置步骤在两个互联集群上都要执行一次。这些步骤假设有两个集群,为清晰起见,分别命名为 local 和 remote ,二者都可以从一台主机访问。

关于如何连接到不同的 Ceph 集群,请参考 rbd 手册页。


以下实例中的集群名对应着同名的 Ceph 配置文件( 如 /etc/ceph/site-b.conf )。如何配置多个集群请参考 ceph-conf 文档。注意, rbd-mirror 不要求 源和目的集群有唯一的内部名称,都可以、而且都应该叫 cephrbd-mirror 需要的配置 files 所指的本地和远程集群可以任意命名,另外,把守护进程容器化也是一个在 /etc/ceph 之外运营集群以避免混淆的策略。


要用 rbd 命令启用存储池的镜像功能,可指定 mirror pool enable 命令、存储池名字、镜像模式、和一个可选的站点名(用于描述本地集群):

rbd mirror pool enable [--site-name {local-site-name}] {pool-name} {mode}

其中,镜像模式可以是 imagepool

  • image: 配置为 image 模式时,需显式地开启各个镜像的镜像功能。
  • pool (默认的): 配置为 pool 模式时,存储池内所有启用了 journaling 功能的映像都会被镜像。


$ rbd --cluster site-a mirror pool enable --site-name site-a image-pool image
$ rbd --cluster site-b mirror pool enable --site-name site-b image-pool image

创建或导入一个新的 自举引导令牌 时也可以指定站点名。

The site name can be changed later using the same mirror pool enable subcommand but note that the local site name and the corresponding site name used by the remote cluster generally must match.


要用 rbd 命令禁用存储池的镜像功能,可指定 mirror pool disable 命令和存储池名字:

rbd mirror pool disable {pool-name}



$ rbd --cluster site-a mirror pool disable image-pool
$ rbd --cluster site-b mirror pool disable image-pool

Bootstrap Peers

In order for the rbd-mirror daemon to discover its peer cluster, the peer must be registered and a user account must be created. This process can be automated with rbd and the mirror pool peer bootstrap create and mirror pool peer bootstrap import commands.

To manually create a new bootstrap token with rbd, issue the mirror pool peer bootstrap create subcommand, a pool name, and an optional friendly site name to describe the local cluster:

rbd mirror pool peer bootstrap create [--site-name {local-site-name}] {pool-name}

The output of mirror pool peer bootstrap create will be a token that should be provided to the mirror pool peer bootstrap import command. 例如, on site-a:

$ rbd --cluster site-a mirror pool peer bootstrap create --site-name site-a image-pool

To manually import the bootstrap token created by another cluster with rbd, specify the mirror pool peer bootstrap import command, the pool name, a file path to the created token (or '-' to read from standard input), along with an optional friendly site name to describe the local cluster and a mirroring direction (defaults to rx-tx for bidirectional mirroring, but can also be set to rx-only for unidirectional mirroring):

rbd mirror pool peer bootstrap import [--site-name {local-site-name}] [--direction {rx-only or rx-tx}] {pool-name} {token-path}

例如,在 site-b 上:

$ cat <<EOF > token
$ rbd --cluster site-b mirror pool peer bootstrap import --site-name site-b image-pool token


Cluster peers can be specified manually if desired or if the above bootstrap commands are not available with the currently installed Ceph release.

The remote rbd-mirror daemon will need access to the local cluster to perform mirroring. A new local Ceph user should be created for the remote daemon to use. To 创建一个 Ceph 用户, with ceph specify the auth get-or-create command, user name, monitor caps, and OSD caps:

$ ceph auth get-or-create client.rbd-mirror-peer mon 'profile rbd-mirror-peer' osd 'profile rbd'

The resulting keyring should be copied to the other cluster's rbd-mirror daemon hosts if not using the Ceph monitor config-key store described below.

To manually add a mirroring peer Ceph cluster with rbd, specify the mirror pool peer add command, the pool name, and a cluster specification:

rbd mirror pool peer add {pool-name} {client-name}@{cluster-name}


$ rbd --cluster site-a mirror pool peer add image-pool client.rbd-mirror-peer@site-b
$ rbd --cluster site-b mirror pool peer add image-pool client.rbd-mirror-peer@site-a

By default, the rbd-mirror daemon needs to have access to a Ceph configuration file located at /etc/ceph/{cluster-name}.conf that provides the addresses of the peer cluster's monitors, in addition to a keyring for {client-name} located in the default or configured keyring search paths (e.g. /etc/ceph/{cluster-name}.{client-name}.keyring).

Alternatively, the peer cluster's monitor and/or client key can be securely stored within the local Ceph monitor config-key store. To specify the peer cluster connection attributes when adding a mirroring peer, use the --remote-mon-host and --remote-key-file optionals. 例如:

$ cat <<EOF > remote-key-file
$ rbd --cluster site-a mirror pool peer add image-pool client.rbd-mirror-peer@site-b --remote-mon-host, --remote-key-file remote-key-file
$ rbd --cluster site-a mirror pool info image-pool --all
Mode: pool
  UUID                                 NAME   CLIENT                 MON_HOST                KEY
  587b08db-3d33-4f32-8af8-421e77abb081 site-b client.rbd-mirror-peer, AQAeuZdbMMoBChAAcj++/XUxNOLFaWdtTREEsw==


要用 rbd 删除镜像点 Ceph 集群,可指定 mirror pool peer remove 命令、以及互联点的 UUID (可用 rbd mirror pool info 命令找出):

rbd mirror pool peer remove {pool-name} {peer-uuid}


$ rbd --cluster site-a mirror pool peer remove image-pool 55672766-c02b-4729-8567-f13a66893445
$ rbd --cluster site-b mirror pool peer remove image-pool 60c0e299-b38f-4234-91f6-eed0a367be08


在目的集群创建映像时, rbd-mirror 这样选择数据集群:

  1. 如果目的集群已配置了一个默认的数据存储池(用 rbd_default_data_pool 配置选项),那就用它;
  2. 否则,如果源映像位于独立的数据存储池内,而且目的集群上也有同名的一个存储池,那就选用它;
  3. 如果上述二者都不可行,那就不会选中数据存储池。


不像存储池配置方式,映像配置只需要操作单个镜像点 Ceph 集群就行。

被镜像的 RBD 映像需指定为主、或非主的,这是映像的属性、不是存储池的。被指定为非主的映像不能被修改。

某一映像的镜像功能被开启时,它会被自动晋级为主映像(在存储池镜像模式为 pool 且映像开启了 journaling 映像功能时为隐式的;或者,如果存储池镜像模式为 image 时可以用 rbd 命令显式地开启)。


如果映像所在存储池的镜像功能配置成了 image 模式,那就得显式地启用各个映像的镜像功能。可以用 rbdmirror image enable 命令、再加上存储池、映像名和模式:

rbd mirror image enable {pool-name}/{image-name} {mode}

The mirror image mode can either be journal or snapshot:

  • journal (default): When configured in journal mode, mirroring will utilize the RBD journaling image feature to replicate the image contents. If the RBD journaling image feature is not yet enabled on the image, it will be automatically enabled.
  • snapshot: When configured in snapshot mode, mirroring will utilize RBD image mirror-snapshots to replicate the image contents. Once enabled, an initial mirror-snapshot will automatically be created. Additional RBD image mirror-snapshots can be created by the rbd command.


$ rbd --cluster site-a mirror image enable image-pool/image-1 snapshot
$ rbd --cluster site-a mirror image enable image-pool/image-2 journal

开启映像的 journaling 功能

RBD 镜像用 journaling 功能来保证复制的映像始终保持崩溃一致性。使用 image 镜像模式时,在此映像上启用镜像的同时就会自动启用日志功能;使用 pool 镜像模式时,必须先启用 RBD 映像日志功能,映像才能被镜像到对点集群。此功能可在创建映像时打开,即执行 rbd 命令时加上 --image-feature exclusive-lock,journaling 选项。

另外,在已存在的 RBD 映像上也可以动态地开启 journaling 功能。要用 rbd 命令开启 journaling 功能可指定 feature enable 命令、存储池名和映像名、以及功能名:

rbd feature enable {pool-name}/{image-name} {feature-name}


$ rbd --cluster site-a feature enable image-pool/image-1 journaling


journaling 功能依赖于 exclusive-lock (互斥锁)功能。如果 exclusive-lock 功能还没启用,应该先启用它、再启用 journaling 功能。


你可以让所有新映像默认启用日志功能,把 rbd default features = 125 写入配置文件即可。


rbd-mirror tunables are set by default to values suitable for mirroring an entire pool. When using rbd-mirror to migrate single volumes been clusters you may achieve substantial performance gains by setting rbd_mirror_journal_max_fetch_bytes=33554432 and rbd_journal_max_payload_bytes=8388608 within the [client] config section of the local or centralized configuration. Note that these settings may allow rbd-mirror to present a substantial write workload to the destination cluster: monitor cluster performance closely during migrations and test carefully before running multiple migrations in parallel.

Create Image Mirror-Snapshots

When using snapshot-based mirroring, mirror-snapshots will need to be created whenever it is desired to mirror the changed contents of the RBD image. To create a mirror-snapshot manually with rbd, specify the mirror image snapshot command along with the pool and image name:

rbd mirror image snapshot {pool-name}/{image-name}


$ rbd --cluster site-a mirror image snapshot image-pool/image-1

By default up to 5 mirror-snapshots will be created per-image. The most recent mirror-snapshot is automatically pruned if the limit is reached. The limit can be overridden via the rbd_mirroring_max_mirroring_snapshots configuration option if required. Additionally, mirror-snapshots are automatically deleted when the image is removed or when mirroring is disabled.

Mirror-snapshots can also be automatically created on a periodic basis if mirror-snapshot schedules are defined. The mirror-snapshot can be scheduled globally, per-pool, or per-image levels. Multiple mirror-snapshot schedules can be defined at any level, but only the most-specific snapshot schedules that match an individual mirrored image will run.

To create a mirror-snapshot schedule with rbd, specify the mirror snapshot schedule add command along with an optional pool or image name; interval; and optional start time:

rbd mirror snapshot schedule add [--pool {pool-name}] [--image {image-name}] {interval} [{start-time}]

The interval can be specified in days, hours, or minutes using d, h, m suffix respectively. The optional start-time can be specified using the ISO 8601 time format. 例如:

$ rbd --cluster site-a mirror snapshot schedule add --pool image-pool 24h 14:00:00-05:00
$ rbd --cluster site-a mirror snapshot schedule add --pool image-pool --image image1 6h

To remove a mirror-snapshot schedules with rbd, specify the mirror snapshot schedule remove command with options that match the corresponding add schedule command.

To list all snapshot schedules for a specific level (global, pool, or image) with rbd, specify the mirror snapshot schedule ls command along with an optional pool or image name. Additionally, the --recursive option can be specified to list all schedules at the specified level and below. For example:

$ rbd --cluster site-a mirror snapshot schedule ls --pool image-pool --recursive
image-pool  -         -      every 1d starting at 14:00:00-05:00
image-pool            image1 every 6h

To view the status for when the next snapshots will be created for snapshot-based mirroring RBD images with rbd, specify the mirror snapshot schedule status command along with an optional pool or image name:

rbd mirror snapshot schedule status [--pool {pool-name}] [--image {image-name}]


$ rbd --cluster site-a mirror snapshot schedule status
2020-02-26 18:00:00 image-pool/image1


要禁用某一映像的镜像功能,可用 rbd 、加 mirror image disable 命令,再加上存储池名和映像名:

rbd mirror image disable {pool-name}/{image-name}


$ rbd --cluster site-a mirror image disable image-pool/image-1



译者注: promotion 翻译为晋级, demotion 翻译为降级。

在故障切换时,主映像标记要被挪到互联 Ceph 集群的对应映像上,到主映像的访问应该停止(例如关闭相应的 VM 、或者从 VM 里移除关联设备),降级当前的主映像,晋级新的主映像,然后在另一个集群上恢复访问。


RBD 仅仅提供了实现故障有序切换所必需的工具集,你仍需要一套外部机制来保障整个故障切换进程的顺利进行(例如降级前先关闭映像)。

要用 rbd 命令把某一映像降级成非主的,用 mirror image demote 命令,加上存储池名和映像名:

rbd mirror image demote {pool-name}/{image-name}


$ rbd --cluster site-a mirror image demote image-pool/image-1

要用 rbd 命令把存储池内的所有映像降级为非主的,可用 mirror pool demote 命令,加上存储池名:

rbd mirror pool demote {pool-name}


$ rbd --cluster site-a mirror pool demote image-pool

要用 rbd 把某一映像晋级为主的,可用 mirror image promote 命令、加存储池名和映像名:

rbd mirror image promote [--force] {pool-name}/{image-name}


$ rbd --cluster site-b mirror image promote image-pool/image-1

要用 rbd 命令把存储池内的所有映像晋级为主的,可用 mirror pool promote 命令,加上存储池名:

rbd mirror pool promote [--force] {pool-name}


$ rbd --cluster site-a mirror pool promote image-pool


由于主、非主状态是基于单个映像的,所以有可能让两个集群分摊 IO 负载、并实现故障切换、故障恢复。


晋级可用 --force 选项强制施行。在降级未能传达到互联的 Ceph 集群时(例如 Ceph 集群故障,通讯中断),就需要强制晋级。这会导致两个对等点形成裂脑( split-brain ),并且这两个映像无法回到同步状态,只能通过强制重新同步命令恢复同步。


如果 rbd-mirror 守护进程探测到了裂脑事件,它就不会再企图镜像受影响的映像,除非已纠正。要恢复一个映像的镜像,首先找出过期的映像、并降级此映像,然后向主映像发出一个重新同步的请求。要用 rbd 请求重新同步映像,可用 mirror image resync 命令、加上存储池名和映像名:

rbd mirror image resync {pool-name}/{image-name}


$ rbd mirror image resync image-pool/image-1


rbd 命令仅仅把这个映像标记为需要重新同步。本地集群的 rbd-mirror 守护进程负责异步地重新同步。


每一个主的、被镜像的映像都存储了互联集群的复制状态,这些状态信息可用 mirror image statusmirror pool status 命令查看。

要用 rbd 查看映像的镜像状态,可用 mirror image status 命令、加上存储池名、映像名:

rbd mirror image status {pool-name}/{image-name}


$ rbd mirror image status image-pool/image-1

要用 rbd 命令查看存储池的镜像汇总状态,可用 mirror pool status 命令、加上存储池名:

rbd mirror pool status {pool-name}


$ rbd mirror pool status image-pool


mirror pool status 命令加上 --verbose 选项,它还会额外输出此存储池内每一个映像的镜像状态细节。

rbd-mirror 守护进程

两边的 rbd-mirror 守护进程负责监视远端的、互联集群的映像日志,并在本地集群回放这些日志事件。 RBD 映像的 journaling 功能会在映像内按其发生顺序记录所有变更,这样可确保远端映像的崩溃一致镜像在本地可用。

rbd-mirror 守护进程随可选的 rbd-mirror 发行版软件包提供。


每个 rbd-mirror 守护进程都要求能同时连接两边的集群。


小于 Luminous 的版本:每个 Ceph 集群只能运行一个 rbd-mirror 守护进程。

每个 rbd-mirror 守护进程都应该使用唯一的 Ceph 用户 ID 。要_创建一个 Ceph 用户,用 ceph 命令,加上 auth get-or-create 、用户名、监视器能力、和 OSD 能力:

ceph auth get-or-create client.rbd-mirror.{unique id} mon 'profile rbd-mirror' osd 'profile rbd'

rbd-mirror 守护进程可以用 systemd 管理,用户 ID 作为守护进程例程:

systemctl enable ceph-rbd-mirror@rbd-mirror.{unique id}

rbd-mirror 也能放在前台运行,命令如下:

rbd-mirror -f --log-file={log_path}