Skip to content

Latest commit

 

History

History
186 lines (135 loc) · 11.6 KB

ECCTOOL_EXAMPLES.md

File metadata and controls

186 lines (135 loc) · 11.6 KB

ecctool examples

This page describes examples for ecctool, if you're looking for general ecctool documentation refer to ECCTOOL.md

repairs

In this example, we will use ecctool repairs to check the status of manual repairs. The output shows all manual repairs for all ecChronos instances.

-----------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id                                   | Host Id                              | Keyspace | Table  | Status    | Repaired(%) | Completed at        | Repair type |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
| f4fb2b38-b9d0-4390-97ca-eeb284391f80 | ee32d9c7-1a4e-40c2-9b28-1000544011ae | test     | table1 | IN_QUEUE  | 0.00        | -                   | VNODE       |
| f4fb2b38-b9d0-4390-97ca-eeb284391f80 | ba7665b2-5a7b-42b5-9f38-037f2da1e80a | test     | table1 | COMPLETED | 100.00      | 2022-09-22 13:40:07 | VNODE       |
| 9c86aa0e-b4af-4f3c-a89c-1596f8d1dd2a | ba7665b2-5a7b-42b5-9f38-037f2da1e80a | test     | table2 | COMPLETED | 100.00      | 2023-09-21 15:26:58 | INCREMENTAL |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Summary: 1 completed, 1 in queue, 0 blocked, 0 warning, 0 error

Looking at the example output above, the columns are:

Id - the manual repair ID, manual repair triggered on several hosts will have the same ID.

Host Id - the host id of the Cassandra instance ecChronos responsible for performing manual repair is connected to.

Keyspace - the keyspace the manual repair is run on.

Table - the table the manual repair is run on.

Status - the status of the manual repair. The possible statuses are:

  • IN_QUEUE - the manual repair is awaiting execution or is currently running
  • ERROR - the manual repair failed, some ranges might've failed or the ranges have changed.
  • COMPLETED - the manual repair is completed, all ranges have been repaired.

Repaired(%) - the number of ranges repaired vs total ranges. For manual repairs this value should never go down.

Completed at - the time when the manual repair has finished.

Repair type - the type of the repair, can be VNODE, PARALLEL_VNODE or INCREMENTAL. All repairs pre ecChronos 5.0 were VNODE.

schedules

In this example we will use ecctool schedules to check the status of schedules. The output shows all schedules the local ecChronos instance has. For new tables a completed time will be calculated using an initial delay that is configurable.

Snapshot as of 2023-06-12 15:33:25
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id                                   | Keyspace              | Table              | Status    | Repaired(%) | Completed at        | Next repair         | Repair type |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 51a64d70-0924-11ee-9173-1f4a33dd583b | test                  | table2             | COMPLETED | 100.00      | 2023-06-06 15:33:19 | 2023-06-13 15:33:19 | VNODE       |
| 552d5150-0924-11ee-9173-1f4a33dd583b | keyspaceWithCamelCase | tableWithCamelCase | COMPLETED | 100.00      | 2023-06-06 15:33:19 | 2023-06-13 15:33:19 | VNODE       |
| 5365b0b0-0924-11ee-9173-1f4a33dd583b | test2                 | table1             | COMPLETED | 100.00      | 2023-06-06 15:33:19 | 2023-06-13 15:33:19 | VNODE       |
| 54079600-0924-11ee-9173-1f4a33dd583b | test2                 | table2             | COMPLETED | 100.00      | 2023-06-06 15:33:19 | 2023-06-13 15:33:19 | VNODE       |
| 51021e30-0924-11ee-9173-1f4a33dd583b | test                  | table1             | COMPLETED | 100.00      | 2023-06-12 15:26:27 | 2023-06-19 15:26:27 | VNODE       |
| 24280063-034f-4f63-9e40-2f8a86ca569b | test                  | table2             | COMPLETED | 100.00      | 2023-06-12 15:26:53 | 2023-06-13 15:26:53 | INCREMENTAL |
| d4c0fdb8-8183-4bbe-b714-53f8aff892f0 | test                  | table1             | COMPLETED | 100.00      | 2023-06-12 15:33:19 | 2023-06-13 15:33:19 | INCREMENTAL |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Summary: 7 completed, 0 on time, 0 blocked, 0 late, 0 overdue

Looking at the example output above, the first line is Snapshot as of 2022-09-22 14:05:12. This means that the output is tied to a point in time, the output might change if the subcommand is run at a different time.

Looking at the table, the columns are:

Id - the schedule ID, this corresponds to the table id.

Keyspace - the keyspace the repair is run on.

Table - the table the repair is run on.

Status - the status of the repair. The possible statuses are:

  • ON_TIME - the schedule is awaiting execution or is currently running
  • LATE - the schedule is late, warning time specified in the configuration has passed.
  • OVERDUE - the schedule is overdue, error time specified in the configuration has passed.
  • COMPLETED - the schedule is completed, all ranges have been repaired within the interval.
  • BLOCKED - the schedule is blocked, occurs if a schedule should be executing but is blocked by a run-policy or if a repair task has failed and triggered a backoff (30 minutes).

Repaired(%) - the number of ranges repaired within the interval vs total ranges. For schedules this value can go up and down as ranges become unrepaired.

Completed at - the time when the all ranges for the schedule are repaired. ecChronos assumes all ranges are repaired if there's no repair history.

Next repair - the time when the schedule will be made ready for execution. This is based on the (oldest range repair time + interval) - repair time taken for the ranges. This is updated each time a repair group is completed.

Repair type - the type of the schedule, can be VNODE, PARALLEL_VNODE or INCREMENTAL. All schedules pre ecChronos 5.0 were VNODE.

run-repair

In this example we will use ecctool run-repair to run a manual repair for all ecChronos instances, for all keyspaces and tables. The output shows created manual repairs for all ecChronos instances.

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id                                   | Host Id                              | Keyspace              | Table              | Status   | Repaired(%) | Completed at | Repair type |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 497eb4cf-9275-4216-9cca-12958bde28af | 6424a5fa-69ea-49a3-a542-4751d0283c9a | test2                 | table2             | IN_QUEUE | 0.00        | -            | VNODE       |
| 497eb4cf-9275-4216-9cca-12958bde28af | 045c01c1-ff50-4de2-8da3-1b9270c382b5 | test2                 | table2             | IN_QUEUE | 0.00        | -            | VNODE       |
| ea32c8b5-3e8a-466a-b5f5-f9248e73774c | 6424a5fa-69ea-49a3-a542-4751d0283c9a | test                  | table2             | IN_QUEUE | 0.00        | -            | VNODE       |
| ea32c8b5-3e8a-466a-b5f5-f9248e73774c | 045c01c1-ff50-4de2-8da3-1b9270c382b5 | test                  | table2             | IN_QUEUE | 0.00        | -            | VNODE       |
| 0d8845fd-84dc-435c-8b8b-83b701dd2cbd | 045c01c1-ff50-4de2-8da3-1b9270c382b5 | keyspaceWithCamelCase | tableWithCamelCase | IN_QUEUE | 0.00        | -            | VNODE       |
| 0d8845fd-84dc-435c-8b8b-83b701dd2cbd | 6424a5fa-69ea-49a3-a542-4751d0283c9a | keyspaceWithCamelCase | tableWithCamelCase | IN_QUEUE | 0.00        | -            | VNODE       |
| c5f830b0-533a-464f-80ed-aa8b90248ba3 | 6424a5fa-69ea-49a3-a542-4751d0283c9a | test2                 | table1             | IN_QUEUE | 0.00        | -            | VNODE       |
| c5f830b0-533a-464f-80ed-aa8b90248ba3 | 045c01c1-ff50-4de2-8da3-1b9270c382b5 | test2                 | table1             | IN_QUEUE | 0.00        | -            | VNODE       |
| 73c27554-58a5-47b9-a2ab-01b9fcfad4f0 | 6424a5fa-69ea-49a3-a542-4751d0283c9a | test                  | table1             | IN_QUEUE | 0.00        | -            | VNODE       |
| 73c27554-58a5-47b9-a2ab-01b9fcfad4f0 | 045c01c1-ff50-4de2-8da3-1b9270c382b5 | test                  | table1             | IN_QUEUE | 0.00        | -            | VNODE       |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Summary: 0 completed, 10 in queue, 0 blocked, 0 warning, 0 error

Looking at the example output above, the columns are:

Id - the manual repair ID, manual repair triggered on several hosts will have the same ID.

Host Id - the host id of the Cassandra instance ecChronos responsible for performing manual repair is connected to.

Keyspace - the keyspace the manual repair is run on.

Table - the table the manual repair is run on.

Status - the status of the manual repair. This will always be IN_QUEUE for newly run manual repairs.

Repaired(%) - the number of ranges repaired vs total ranges. For manual repairs this value should never go down. This will always be 0 for newly run manual repairs.

Completed at - the time when the manual repair has finished. This will always be - for newly run manual repairs.

After running this subcommand, to check the progress of running manual repairs use ecctool repairs.

repair-info

In this example we will use ecctool repair-info --duration 5m to check how much each table is repaired. The output shows the cluster wide repair information for all tables in the past 5 minutes.

Time window between '2022-09-23 13:12:54' and '2022-09-23 13:17:54'
---------------------------------------------------------------------------------
| Keyspace              | Table              | Repaired (%) | Repair time taken |
---------------------------------------------------------------------------------
| keyspaceWithCamelCase | tableWithCamelCase | 73.73        | 3 seconds         |
| test                  | table1             | 73.73        | 3 seconds         |
| test                  | table2             | 73.73        | 4 seconds         |
| test2                 | table1             | 73.73        | 3 seconds         |
| test2                 | table2             | 73.73        | 4 seconds         |
---------------------------------------------------------------------------------

Looking at the example output above, the columns are:

Keyspace - the keyspace the repair information corresponds to.

Table - the table the repair information corresponds to.

Repaired (%) - the repaired ranges vs total ranges of the table in %.

Repair time taken - the time taken for the Cassandra to finish the repairs.

By default, repair-info fetches the information on a cluster level. To check the repair information for the local node use --local flag.

running-job

In this example we will use ecctool running-job to check if any job is currently running. It will give one of these two responses

No job is currently running

or

Job ID: x-x-x-x-x, Status: Running