Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Query Serving Docs #1293

Merged
merged 8 commits into from
Jan 5, 2023
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 7 additions & 5 deletions content/en/docs/15.0/reference/features/mysql-replication.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,23 +12,25 @@ Vitess makes use of MySQL Replication for both high availability and to receive

## Semi-Sync

Vitess strongly recommends the use of Semi-synchronous replication for High Availability. When enabled in Vitess, _semi-sync_ has the following characteristics:
Vitess strongly recommends the use of Semi-synchronous replication for High Availability. The characteristics of semi-sync are replication governed by the [Durability Policy](../../../user-guides/configuration-basic/durability_policy) configured for the keyspace.
Some characteristics are shared by all the policies -

* The primary will only accept writes if it has at least one replica connected, and configured correctly to send semi-sync ACKs. Vitess configures the semi-sync timeout to essentially an unlimited number so that it will never fallback to asyncronous replication. This is important to prevent split brain (or alternate futures) in case of a network partition. If we can verify all replicas have stopped replicating, we know the old primary is not accepting writes, even if we are unable to contact the old primary itself.
* Vitess configures the semi-sync timeout to essentially an unlimited number so that it will never fallback to asyncronous replication. This is important to prevent split brain (or alternate futures) in case of a network partition. If we can verify all replicas have stopped replicating, we know the old primary is not accepting writes, even if we are unable to contact the old primary itself.

* Tablets of type rdonly will not send semi-sync ACKs. This is intentional because rdonly tablets are not eligible to be promoted to primary, so Vitess avoids the case where a rdonly tablet is the single best candidate for election at the time of primary failure.
* All pre-configured durability policies do not allow tablets of type rdonly to send semi-sync ACKs. This is intentional because rdonly tablets are not eligible to be promoted to primary, so Vitess avoids the case where a rdonly tablet is the single best candidate for election at the time of primary failure.

These behaviors combine to give you the property that, in case of primary failure, there is at least one other replica that has every transaction that was ever reported to clients as having completed. You can then ([manually](../../programs/vtctl/#emergencyreparentshard), or [using Orchestrator](../../../user-guides/configuration-advanced/integration-with-orchestrator/) to pick the replica that is farthest ahead in GTID position and promote that to be the new primary.
Having semi-sync enabled, gives you the property that, in case of primary failure, there is at least one other replica that has every transaction that was ever reported to clients as having completed. You can then wait for [VTOrc](../../../user-guides/configuration-basic/vtorc) to repair it, or ([manually](../../programs/vtctl/#emergencyreparentshard) pick the replica that is farthest ahead in GTID position and promote that to be the new primary.

Thus, you can survive sudden primary failure without losing any transactions that were reported to clients as completed. In MySQL 5.7+, this guarantee is strengthened slightly to preventing loss of any transactions that were ever **committed** on the original primary, eliminating so-called [phantom reads](http://bugs.mysql.com/bug.php?id=62174).

On the other hand these behaviors also give a requirement that each shard must have at least 2 tablets with type *replica* (with addition of the primary that can be demoted to type *replica* this gives a minimum of 3 tablets with initial type *replica*). This will allow for the primary to have a semi-sync acker when one of the replica tablets is down for any reason (for a version update, machine reboot, schema swap or anything else).
These requirements will changed based on the durability policy.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will changed -> will change


With regard to replication lag, note that this does **not** guarantee there is always at least one replica from which queries will always return up-to-date results. Semi-sync guarantees that at least one replica has the transaction in its relay log, but it has not necessarily been applied yet. The only way Vitess guarantees a fully up-to-date read is to send the request to the primary.

## Database Schema Considerations

* Row-based replication requires that replicas have the same schema as the primary, and corruption will likely occur if the column order does not match. Earlier versions of Vitess which used Statement-Based replication recommended applying schema changes on replicas first, and then swapping their role to primary. This method is no longer recommended and a tool such as [`gh-ost`](https://github.com/github/gh-ost) or [`pt-online-schema-change`](https://www.percona.com/doc/percona-toolkit/LATEST/pt-online-schema-change.html) should be used instead.
* Row-based replication requires that replicas have the same schema as the primary, and corruption will likely occur if the column order does not match. Earlier versions of Vitess which used Statement-Based replication recommended applying schema changes on replicas first, and then swapping their role to primary. This method is no longer recommended in favour of the use of Online-DDL. More information can be found [here](../../../user-guides/schema-changes).

* Using a column of type `FLOAT` or `DOUBLE` as part of a Primary Key is not supported. This limitation is because Vitess may try to execute a query for a value (for example 2.2) which MySQL will return zero results, even when the approximate value is present.

Expand Down
4 changes: 2 additions & 2 deletions content/en/docs/15.0/reference/features/sharding.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ Sharding is a method of horizontally partitioning a database to store data acros

A keyspace in Vitess can be sharded or unsharded. An unsharded keyspace maps directly to a MySQL database. If sharded, the rows of the keyspace are partitioned into different databases of identical schema.

For example, if an application's "user" keyspace is split into two shards, each shard contains records for approximately half of the application's users. Similarly, each user's information is stored in only one shard.
For example, if an application's "user" keyspace is split into two shards, each shard contains records for approximately half of the application's users. Each single user's information however, is stored in only one shard.

Note that sharding is orthogonal to (MySQL) replication. A Vitess shard typically contains one MySQL primary and many MySQL replicas. The primary handles write operations, while replicas handle read-only traffic, batch processing operations, and other tasks. Each MySQL instance within the shard should have the same data, excepting some replication lag.

Expand Down Expand Up @@ -43,7 +43,7 @@ When building the serving graph for a sharded keyspace, Vitess ensures that each
* An empty start value represents the lowest value, and all values are greater than it.
* An empty end value represents a value larger than the highest possible value, and all values are strictly lower than it.

Vitess always converts sharding keys to a left-justified binary string for computing a shard. This left-justification makes the right-most zeroes insignificant and optional. Therefore, the value `0x80` is always the middle value for sharding keys. So, in a keyspace with two shards, sharding keys that have a binary value lower than 0x80 are assigned to one shard. Keys with a binary value equal to or higher than 0x80 are assigned to the other shard.
Vitess always converts sharding keys to a left-justified binary string for computing a shard. This left-justification makes the right-most zeroes insignificant and optional. Therefore, the value `0x80` is the middle value for sharding keys. So, in a keyspace with two shards, sharding keys that have a binary value lower than 0x80 are assigned to the first shard. Keys with a binary value equal to or higher than 0x80 are assigned to the other shard.

Several sample key ranges are shown below:

Expand Down
4 changes: 2 additions & 2 deletions content/en/docs/15.0/reference/features/vindexes.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ The lookup table that implements a Lookup Vindex can be sharded or unsharded. N

Vitess allows for the transparent population of these lookup table rows by assigning an owner table, which is the main table that requires this lookup. When a row is inserted into this owner table, the lookup row for it is created in the lookup table. The lookup row is also deleted upon a delete of the corresponding row in the owner table. These essentially result in distributed transactions, which traditionally require 2PC to guarantee atomicity.

Consistent lookup vindexes use an alternate approach that makes use of careful locking and transaction sequences to guarantee consistency without using 2PC. This gives the best of both worlds, with the benefit of a consistent cross-shard vindex without paying the price of 2PC. To read more about what makes a consistent lookup vindex different from a standard lookup vindex read our [consistent lookup vindexes design documentation](../../../design-docs/query-serving/clookup-vindex/).
Consistent lookup vindexes use an alternate approach that makes use of careful locking and transaction sequences to guarantee consistency without using 2PC. This gives the best of both worlds, with the benefit of a consistent cross-shard vindex without paying the price of 2PC. To read more about what makes a consistent lookup vindex different from a standard lookup vindex read our [consistent lookup vindexes design documentation](../../../../design-docs/query-serving/clookup-vindex).

There are currently two vindex types in Vitess for consistent lookup:

Expand Down Expand Up @@ -170,7 +170,7 @@ Vindex Type | Cost

In the case of a simple select, Vitess scans the WHERE clause to match references to Vindex columns and chooses the best one to use. If there is no match and the query is simple without complex constructs like aggregates, etc., it is sent to all shards.

Vitess can handle more complex queries. For now, refer to the [design doc](https://github.com/vitessio/vitess/blob/main/doc/V3HighLevelDesign.md) for background information on how it handles them.
Vitess can handle more complex queries with the new Gen4 planner.

#### Insert

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ exposed to the clients only when safe.
Alternative to auto-incrementing IDs are:

* Using a 64 bit random generator number. Try to insert a new row with that
ID. If taken (because the statement returns an integrity error), try another
ID. If already taken (the statement returns an integrity error), try another
ID.

* Using a UUID scheme, and generate truly unique IDs.
Expand Down
2 changes: 2 additions & 0 deletions content/en/docs/15.0/reference/features/vschema.md
Original file line number Diff line number Diff line change
Expand Up @@ -299,6 +299,8 @@ If a query goes across multiple shards and ordering is needed on the `name` colu

If `column_list_authoritative` is false or not specified, then VTGate will treat the list of columns as partial and will not automatically expand open-ended constructs like `select *`.

Vtgates also have the capability to track the schema changes and populate the columns list on its own. To know more about this feature, read [here](../schema-tracking).

### Advanced usage

The examples/demo also shows more tricks you can perform:
Expand Down
12 changes: 7 additions & 5 deletions content/en/docs/16.0/reference/features/mysql-replication.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,23 +12,25 @@ Vitess makes use of MySQL Replication for both high availability and to receive

## Semi-Sync

Vitess strongly recommends the use of Semi-synchronous replication for High Availability. When enabled in Vitess, _semi-sync_ has the following characteristics:
Vitess strongly recommends the use of Semi-synchronous replication for High Availability. The characteristics of semi-sync are replication governed by the [Durability Policy](../../../user-guides/configuration-basic/durability_policy) configured for the keyspace.
Some characteristics are shared by all the policies -

* The primary will only accept writes if it has at least one replica connected, and configured correctly to send semi-sync ACKs. Vitess configures the semi-sync timeout to essentially an unlimited number so that it will never fallback to asyncronous replication. This is important to prevent split brain (or alternate futures) in case of a network partition. If we can verify all replicas have stopped replicating, we know the old primary is not accepting writes, even if we are unable to contact the old primary itself.
* Vitess configures the semi-sync timeout to essentially an unlimited number so that it will never fallback to asyncronous replication. This is important to prevent split brain (or alternate futures) in case of a network partition. If we can verify all replicas have stopped replicating, we know the old primary is not accepting writes, even if we are unable to contact the old primary itself.

* Tablets of type rdonly will not send semi-sync ACKs. This is intentional because rdonly tablets are not eligible to be promoted to primary, so Vitess avoids the case where a rdonly tablet is the single best candidate for election at the time of primary failure.
* All pre-configured durability policies do not allow tablets of type rdonly to send semi-sync ACKs. This is intentional because rdonly tablets are not eligible to be promoted to primary, so Vitess avoids the case where a rdonly tablet is the single best candidate for election at the time of primary failure.

These behaviors combine to give you the property that, in case of primary failure, there is at least one other replica that has every transaction that was ever reported to clients as having completed. You can then ([manually](../../programs/vtctl/#emergencyreparentshard), or [using Orchestrator](../../../user-guides/configuration-advanced/integration-with-orchestrator/) to pick the replica that is farthest ahead in GTID position and promote that to be the new primary.
Having semi-sync enabled, gives you the property that, in case of primary failure, there is at least one other replica that has every transaction that was ever reported to clients as having completed. You can then wait for [VTOrc](../../../user-guides/configuration-basic/vtorc) to repair it, or ([manually](../../programs/vtctl/#emergencyreparentshard) pick the replica that is farthest ahead in GTID position and promote that to be the new primary.

Thus, you can survive sudden primary failure without losing any transactions that were reported to clients as completed. In MySQL 5.7+, this guarantee is strengthened slightly to preventing loss of any transactions that were ever **committed** on the original primary, eliminating so-called [phantom reads](http://bugs.mysql.com/bug.php?id=62174).

On the other hand these behaviors also give a requirement that each shard must have at least 2 tablets with type *replica* (with addition of the primary that can be demoted to type *replica* this gives a minimum of 3 tablets with initial type *replica*). This will allow for the primary to have a semi-sync acker when one of the replica tablets is down for any reason (for a version update, machine reboot, schema swap or anything else).
These requirements will changed based on the durability policy.

With regard to replication lag, note that this does **not** guarantee there is always at least one replica from which queries will always return up-to-date results. Semi-sync guarantees that at least one replica has the transaction in its relay log, but it has not necessarily been applied yet. The only way Vitess guarantees a fully up-to-date read is to send the request to the primary.

## Database Schema Considerations

* Row-based replication requires that replicas have the same schema as the primary, and corruption will likely occur if the column order does not match. Earlier versions of Vitess which used Statement-Based replication recommended applying schema changes on replicas first, and then swapping their role to primary. This method is no longer recommended and a tool such as [`gh-ost`](https://github.com/github/gh-ost) or [`pt-online-schema-change`](https://www.percona.com/doc/percona-toolkit/LATEST/pt-online-schema-change.html) should be used instead.
* Row-based replication requires that replicas have the same schema as the primary, and corruption will likely occur if the column order does not match. Earlier versions of Vitess which used Statement-Based replication recommended applying schema changes on replicas first, and then swapping their role to primary. This method is no longer recommended in favour of the use of Online-DDL. More information can be found [here](../../../user-guides/schema-changes).

* Using a column of type `FLOAT` or `DOUBLE` as part of a Primary Key is not supported. This limitation is because Vitess may try to execute a query for a value (for example 2.2) which MySQL will return zero results, even when the approximate value is present.

Expand Down
4 changes: 2 additions & 2 deletions content/en/docs/16.0/reference/features/sharding.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ Sharding is a method of horizontally partitioning a database to store data acros

A keyspace in Vitess can be sharded or unsharded. An unsharded keyspace maps directly to a MySQL database. If sharded, the rows of the keyspace are partitioned into different databases of identical schema.

For example, if an application's "user" keyspace is split into two shards, each shard contains records for approximately half of the application's users. Similarly, each user's information is stored in only one shard.
For example, if an application's "user" keyspace is split into two shards, each shard contains records for approximately half of the application's users. Each single user's information however, is stored in only one shard.

Note that sharding is orthogonal to (MySQL) replication. A Vitess shard typically contains one MySQL primary and many MySQL replicas. The primary handles write operations, while replicas handle read-only traffic, batch processing operations, and other tasks. Each MySQL instance within the shard should have the same data, excepting some replication lag.

Expand Down Expand Up @@ -43,7 +43,7 @@ When building the serving graph for a sharded keyspace, Vitess ensures that each
* An empty start value represents the lowest value, and all values are greater than it.
* An empty end value represents a value larger than the highest possible value, and all values are strictly lower than it.

Vitess always converts sharding keys to a left-justified binary string for computing a shard. This left-justification makes the right-most zeroes insignificant and optional. Therefore, the value `0x80` is always the middle value for sharding keys. So, in a keyspace with two shards, sharding keys that have a binary value lower than 0x80 are assigned to one shard. Keys with a binary value equal to or higher than 0x80 are assigned to the other shard.
Vitess always converts sharding keys to a left-justified binary string for computing a shard. This left-justification makes the right-most zeroes insignificant and optional. Therefore, the value `0x80` is the middle value for sharding keys. So, in a keyspace with two shards, sharding keys that have a binary value lower than 0x80 are assigned to the first shard. Keys with a binary value equal to or higher than 0x80 are assigned to the other shard.

Several sample key ranges are shown below:

Expand Down
Loading