|
3 | 3 | Adding instances
|
4 | 4 | ================
|
5 | 5 |
|
6 |
| -.. _replication-add_replica: |
| 6 | +.. _replication-add_replica: |
7 | 7 |
|
8 | 8 | This tutorial is intended as a follow-up to the
|
9 | 9 | :ref:`replication bootstrapping <replication-bootstrap>` guide.
|
@@ -164,151 +164,3 @@ read-only mode for this instance:
|
164 | 164 | We also recommend to specify master #3 URI in all instance files in order to
|
165 | 165 | keep all the files consistent with each other and with the current replication
|
166 | 166 | topology.
|
167 |
| - |
168 |
| -.. _replication-orphan_status: |
169 |
| - |
170 |
| -Orphan status |
171 |
| -------------- |
172 |
| - |
173 |
| -Starting with Tarantool version 1.9, there is a change to the |
174 |
| -procedure when an instance joins a replica set. |
175 |
| -During ``box.cfg()`` the instance will try to join all masters listed |
176 |
| -in :ref:`box.cfg.replication <cfg_replication-replication>`. |
177 |
| -If the instance does not succeed with at least |
178 |
| -the number of masters specified in |
179 |
| -:ref:`replication_connect_quorum <cfg_replication-replication_connect_quorum>`, |
180 |
| -then it will switch to **orphan status**. |
181 |
| -While an instance is in orphan status, it is read-only. |
182 |
| - |
183 |
| -To "join" a master, a replica instance must "connect" to the |
184 |
| -master node and then "sync". |
185 |
| - |
186 |
| -"Connect" means contact the master over the physical network |
187 |
| -and receive acknowledgment. If there is no acknowledgment after |
188 |
| -:ref:`box.replication_connect_timeout <cfg_replication-replication_connect_timeout>` |
189 |
| -seconds (usually 4 seconds), and retries fail, then the connect step fails. |
190 |
| - |
191 |
| -"Sync" means receive updates |
192 |
| -from the master in order to make a local database copy. |
193 |
| -Syncing is complete when the replica has received all the |
194 |
| -updates, or at least has received enough updates that the replica's lag |
195 |
| -(see |
196 |
| -:ref:`replication.upstream.lag <box_info_replication_upstream_lag>` |
197 |
| -in ``box.info()``) |
198 |
| -is less than or equal to the number of seconds specified in |
199 |
| -:ref:`box.cfg.replication_sync_lag <cfg_replication-replication_sync_lag>`. |
200 |
| -If ``replication_sync_lag`` is unset (nil) or set to TIMEOUT_INFINITY, then |
201 |
| -the replica skips the "sync" state and switches to "follow" immediately. |
202 |
| - |
203 |
| -In order to leave orphan mode you need to sync with a sufficient number |
204 |
| -(:ref:`replication_connect_quorum <cfg_replication-replication_connect_quorum>`) of |
205 |
| -instances. To do so, you may either: |
206 |
| - |
207 |
| -* Set :ref:`replication_connect_quorum <cfg_replication-replication_connect_quorum>` |
208 |
| - to a lower value. |
209 |
| -* Reset ``box.cfg.replication`` to exclude instances that cannot be reached |
210 |
| - or synced with. |
211 |
| -* Set ``box.cfg.replication`` to ``""`` (empty string). |
212 |
| - |
213 |
| -The following situations are possible. |
214 |
| - |
215 |
| -.. _replication-leader: |
216 |
| - |
217 |
| -**Situation 1: bootstrap** |
218 |
| - |
219 |
| -Here ``box.cfg{}`` is being called for the first time. |
220 |
| -A replica is joining but no replica set exists yet. |
221 |
| - |
222 |
| - 1. Set status to 'orphan'. |
223 |
| - 2. Try to connect to all nodes from ``box.cfg.replication``, |
224 |
| - or to the number of nodes required by |
225 |
| - :ref:`replication_connect_quorum <cfg_replication-replication_connect_quorum>`. |
226 |
| - Retrying up to 3 times in 30 seconds is possible because this is bootstrap, |
227 |
| - :ref:`replication_connect_timeout <cfg_replication-replication_connect_timeout>` |
228 |
| - is overridden. |
229 |
| - |
230 |
| - 3. Abort and throw an error if not connected to all nodes in ``box.cfg.replication`` or |
231 |
| - :ref:`replication_connect_quorum <cfg_replication-replication_connect_quorum>`. |
232 |
| - |
233 |
| - 4. This instance might be elected as the replica set 'leader'. |
234 |
| - Criteria for electing a leader include vclock value (largest is best), |
235 |
| - and whether it is read-only or read-write (read-write is best unless there is no other choice). |
236 |
| - The leader is the master that other instances must join. |
237 |
| - The leader is the master that executes |
238 |
| - :doc:`box.once() </reference/reference_lua/box_once>` functions. |
239 |
| - |
240 |
| - 5. If this instance is elected as the replica set leader, |
241 |
| - then |
242 |
| - perform an "automatic bootstrap": |
243 |
| - |
244 |
| - a. Set status to 'running'. |
245 |
| - b. Return from ``box.cfg{}``. |
246 |
| - |
247 |
| - Otherwise this instance will be a replica joining an existing replica set, |
248 |
| - so: |
249 |
| - |
250 |
| - a. Bootstrap from the leader. |
251 |
| - See examples in section :ref:`Bootstrapping a replica set <replication-bootstrap>`. |
252 |
| - b. In background, sync with all the other nodes in the replication set. |
253 |
| - |
254 |
| -**Situation 2: recovery** |
255 |
| - |
256 |
| -Here ``box.cfg{}`` is not being called for the first time. |
257 |
| -It is being called again in order to perform recovery. |
258 |
| - |
259 |
| - 1. Perform :ref:`recovery <internals-recovery_process>` from the last local |
260 |
| - snapshot and the WAL files. |
261 |
| - |
262 |
| - 2. Connect to at least |
263 |
| - :ref:`replication_connect_quorum <cfg_replication-replication_connect_quorum>` |
264 |
| - nodes. If failed -- set status to 'orphan'. |
265 |
| - (Attempts to sync will continue in the background and when/if they succeed |
266 |
| - then 'orphan' will be changed to 'connected'.) |
267 |
| - |
268 |
| - 3. If connected - sync with all connected nodes, until the difference is not more than |
269 |
| - :ref:`replication_sync_lag <cfg_replication-replication_sync_lag>` seconds. |
270 |
| - |
271 |
| -.. _replication-configuration_update: |
272 |
| - |
273 |
| -**Situation 3: configuration update** |
274 |
| - |
275 |
| -Here ``box.cfg{}`` is not being called for the first time. |
276 |
| -It is being called again because some replication parameter |
277 |
| -or something in the replica set has changed. |
278 |
| - |
279 |
| - 1. Try to connect to all nodes from ``box.cfg.replication``, |
280 |
| - or to the number of nodes required by |
281 |
| - :ref:`replication_connect_quorum <cfg_replication-replication_connect_quorum>`, |
282 |
| - within the time period specified in |
283 |
| - :ref:`replication_connect_timeout <cfg_replication-replication_connect_timeout>`. |
284 |
| - |
285 |
| - 2. Try to sync with the connected nodes, |
286 |
| - within the time period specified in |
287 |
| - :ref:`replication_sync_timeout <cfg_replication-replication_sync_timeout>`. |
288 |
| - |
289 |
| - 3. If earlier steps fail, change status to 'orphan'. |
290 |
| - (Attempts to sync will continue in the background and when/if they succeed |
291 |
| - then 'orphan' status will end.) |
292 |
| - |
293 |
| - 4. If earlier steps succeed, set status to 'running' (master) or 'follow' (replica). |
294 |
| - |
295 |
| -.. _replication-configuration_rebootstrap: |
296 |
| - |
297 |
| -**Situation 4: rebootstrap** |
298 |
| - |
299 |
| -Here ``box.cfg{}`` is not being called. The replica connected successfully |
300 |
| -at some point in the past, and is now ready for an update from the master. |
301 |
| -But the master cannot provide an update. |
302 |
| -This can happen by accident, or more likely can happen because the replica |
303 |
| -is slow (its :ref:`lag <cfg_replication-replication_sync_lag>` is large), |
304 |
| -and the WAL (.xlog) files containing the |
305 |
| -updates have been deleted. This is not crippling. The replica can discard |
306 |
| -what it received earlier, and then ask for the master's latest snapshot |
307 |
| -(.snap) file contents. Since it is effectively going through the bootstrap |
308 |
| -process a second time, this is called "rebootstrapping". However, there has |
309 |
| -to be one difference from an ordinary bootstrap -- the replica's |
310 |
| -:ref:`replica id <replication-replica-id>` will remain the same. |
311 |
| -If it changed, then the master would think that the replica is a |
312 |
| -new addition to the cluster, and would maintain a record of an |
313 |
| -instance ID of a replica that has ceased to exist. Rebootstrapping was |
314 |
| -introduced in Tarantool version 1.10.2 and is completely automatic. |
0 commit comments