Skip to content

Commit 3be249d

Browse files
committed
Merge branch '5.6' into beta-rel-5.7.12-25.16
Conflicts: CMakeLists.txt README VERSION build-ps/debian/percona-xtradb-cluster-server-5.6.docs build-ps/rpm/mysql-systemd cmake/wsrep.cmake doc/source/conf.py doc/source/release-notes/release-notes_index.rst include/mysql/thread_pool_priv.h man/comp_err.1 man/innochecksum.1 man/msql2mysql.1 man/my_print_defaults.1 man/myisam_ftdump.1 man/myisamchk.1 man/myisamlog.1 man/myisampack.1 man/mysql-stress-test.pl.1 man/mysql-test-run.pl.1 man/mysql.1 man/mysql.server.1 man/mysql_client_test.1 man/mysql_config.1 man/mysql_config_editor.1 man/mysql_convert_table_format.1 man/mysql_find_rows.1 man/mysql_fix_extensions.1 man/mysql_install_db.1 man/mysql_plugin.1 man/mysql_secure_installation.1 man/mysql_setpermission.1 man/mysql_tzinfo_to_sql.1 man/mysql_upgrade.1 man/mysql_waitpid.1 man/mysql_zap.1 man/mysqlaccess.1 man/mysqladmin.1 man/mysqlbinlog.1 man/mysqlbug.1 man/mysqlcheck.1 man/mysqld.8 man/mysqld_multi.1 man/mysqld_safe.1 man/mysqldump.1 man/mysqldumpslow.1 man/mysqlhotcopy.1 man/mysqlimport.1 man/mysqlshow.1 man/mysqlslap.1 man/mysqltest.1 man/ndb-common-options.1 man/ndb_blob_tool.1 man/ndb_config.1 man/ndb_cpcd.1 man/ndb_delete_all.1 man/ndb_desc.1 man/ndb_drop_index.1 man/ndb_drop_table.1 man/ndb_error_reporter.1 man/ndb_index_stat.1 man/ndb_mgm.1 man/ndb_mgmd.8 man/ndb_print_backup_file.1 man/ndb_print_schema_file.1 man/ndb_print_sys_file.1 man/ndb_restore.1 man/ndb_select_all.1 man/ndb_select_count.1 man/ndb_setup.py.1 man/ndb_show_tables.1 man/ndb_size.pl.1 man/ndb_waiter.1 man/ndbd.8 man/ndbd_redo_log_reader.1 man/ndbinfo_select_all.1 man/ndbmtd.8 man/perror.1 man/replace.1 man/resolve_stack_dump.1 man/resolveip.1 mysql-test/suite/binlog/r/binlog_stm_binlog.result mysql-test/suite/galera/r/galera_as_slave_nonprim.result mysql-test/suite/galera/r/galera_defaults.result mysql-test/suite/galera/t/galera_as_slave_nonprim.test mysql-test/suite/galera/t/galera_defaults.test mysql-test/suite/galera/t/galera_var_dirty_reads.test mysql-test/suite/perfschema/t/no_threads.test mysql-test/valgrind.supp plugin/query_response_time/plugin.cc scripts/mysql_config.sh sql/mysqld.cc sql/mysqld.h sql/scheduler.cc sql/sql_base.cc sql/sql_class.h sql/sql_parse.cc sql/sys_vars.cc sql/threadpool_common.cc sql/wsrep_mysqld.cc sql/wsrep_var.cc storage/innobase/handler/ha_innodb.cc storage/innobase/include/univ.i storage/innobase/os/os0file.cc
2 parents 72360f0 + 33f1e24 commit 3be249d

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

55 files changed

+1472
-277
lines changed

CMakeLists.txt

+1-3
Original file line numberDiff line numberDiff line change
@@ -710,9 +710,7 @@ IF(NOT INSTALL_LAYOUT MATCHES "RPM")
710710
)
711711
INSTALL(FILES README.MySQL DESTINATION ${INSTALL_DOCREADMEDIR} COMPONENT Readme)
712712
INSTALL(FILES ${CMAKE_BINARY_DIR}/Docs/INFO_SRC ${CMAKE_BINARY_DIR}/Docs/INFO_BIN DESTINATION ${INSTALL_DOCDIR})
713-
IF(UNIX)
714-
INSTALL(FILES Docs/README-wsrep DESTINATION ${INSTALL_DOCREADMEDIR} COMPONENT Readme)
715-
ENDIF()
713+
INSTALL(FILES Docs/README-wsrep DESTINATION ${INSTALL_DOCREADMEDIR} COMPONENT Readme)
716714
# MYSQL_DOCS_LOCATON is used in "make dist", points to the documentation directory
717715
SET(MYSQL_DOCS_LOCATION "" CACHE PATH "Location from where documentation is copied")
718716
MARK_AS_ADVANCED(MYSQL_DOCS_LOCATION)

README

+8-7
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,12 @@
1-
Percona Server 5.7
2-
------------------
1+
Percona XtraDB Cluster 5.7
2+
--------------------------
33

4-
Percona Server is a branch of MySQL 5.7 bringing higher performance,
5-
reliability and more features.
4+
Percona XtraDB Cluster is based on Percona Server and Codership wsrep API
5+
using galera replication library that provides Multimaster Cluster based
6+
on synchronous replication.
67

7-
http://www.percona.com/software/percona-server/
8+
https://www.percona.com/software/mysql-database/percona-xtradb-cluster
89

9-
Documentation: http://www.percona.com/doc/percona-server/5.7
10+
Documentation: https://www.percona.com/doc/percona-xtradb-cluster/5.7/index.html
1011

11-
Launchpad (bugs, milestones, branches): http://launchpad.net/percona-server
12+
Launchpad (bugs, milestones, branches): https://bugs.launchpad.net/percona-xtradb-cluster/

WSREP-REVISION

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
1a089b3
1+
3e97f27

build-ps/rpm/mysql-systemd

+22
Original file line numberDiff line numberDiff line change
@@ -151,6 +151,28 @@ wait_for_pid () {
151151
fi
152152
}
153153

154+
pinger () {
155+
# Wait for ping to answer to signal startup completed,
156+
# might take a while in case of e.g. crash recovery
157+
# MySQL systemd service will timeout script if no answer
158+
datadir=$(parse_cnf datadir server mysqld)
159+
if [[ -z ${datadir:-} ]]; then
160+
datadir="/var/lib/mysql"
161+
fi
162+
socket=$(parse_cnf socket server mysqld)
163+
case $socket in
164+
/*) adminsocket="$socket" ;;
165+
"") adminsocket="$datadir/mysql.sock" ;;
166+
*) adminsocket="$datadir/$socket" ;;
167+
esac
168+
169+
while /bin/true ; do
170+
sleep 1
171+
mysqladmin --no-defaults --socket="$adminsocket" --user=UNKNOWN_MYSQL_USER ping >/dev/null 2>&1 && break
172+
done
173+
exit 0
174+
}
175+
154176

155177
action=$1
156178
manager=${2:-0}

cmake/package_name.cmake

+2-1
Original file line numberDiff line numberDiff line change
@@ -122,9 +122,10 @@ IF(NOT VERSION)
122122
IF(NOT WSREP_VERSION)
123123
MESSAGE(FATAL_ERROR "Variable WSREP_VERSION must be set")
124124
ENDIF()
125-
SET(package_name "mysql-wsrep${PRODUCT_TAG}-${VERSION}-${WSREP_VERSION}-${SYSTEM_NAME_AND_PROCESSOR}")
125+
SET(package_name "percona-xtradb-cluster${PRODUCT_TAG}-${VERSION}-${WSREP_VERSION}-${SYSTEM_NAME_AND_PROCESSOR}")
126126
ELSE()
127127
SET(package_name "mysql${PRODUCT_TAG}-${VERSION}-${SYSTEM_NAME_AND_PROCESSOR}")
128+
SET(package_name "percona-server${PRODUCT_TAG}-${VERSION}-${SYSTEM_NAME_AND_PROCESSOR}")
128129
ENDIF()
129130
ENDIF()
130131

cmake/wsrep.cmake

+1-1
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717
# so WSREP_VERSION is produced regardless
1818

1919
# Set the patch version
20-
SET(WSREP_PATCH_VERSION "14.2")
20+
SET(WSREP_PATCH_VERSION "15")
2121
INCLUDE(CheckFunctionExists)
2222
CHECK_FUNCTION_EXISTS(execvpe HAVE_EXECVPE)
2323

doc/source/conf.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -53,9 +53,9 @@
5353
# built documents.
5454
#
5555
# The short X.Y version.
56-
version = '5.6.28'
56+
version = '5.6.29'
5757
# The full version, including alpha/beta/rc tags.
58-
release = '5.6.28-25.14'
58+
release = '5.6.29-25.15'
5959

6060
# The language for content autogenerated by Sphinx. Refer to documentation
6161
# for a list of supported languages.

doc/source/index.rst

+3
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,9 @@ User's Manual
7373
manual/restarting_nodes
7474
manual/failover
7575
manual/monitoring
76+
manual/certification
77+
manual/threading_model
78+
manual/gcache_record-set_cache_difference
7679

7780
How-tos
7881
=======

doc/source/manual/certification.rst

+186
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,186 @@
1+
.. _certification:
2+
3+
=======================================
4+
Certification in Percona XtraDB Cluster
5+
=======================================
6+
7+
|Percona XtraDB Cluster| replicates actions executed on one node to all other
8+
nodes in the cluster and make it fast enough to appear as it if is
9+
synchronous (aka virtually synchronous).
10+
11+
There are two main types of actions: DDL and DML. DDL actions are executed
12+
using Total Order Isolation (let's ignore Rolling Schema Upgrade for now) and
13+
DML using normal Galera replication protocol.
14+
15+
.. note::
16+
17+
This manual page assumes the reader is aware of Total Order Isolation and
18+
MySQL replication protocol.
19+
20+
DML (``INSERT``/``UPDATE``/``DELETE``) operations effectively change the state
21+
of the database, and all such operations are recorded in |XtraDB| by
22+
registering a unique object identifier (aka key) for each change (an update
23+
or a new addition).
24+
25+
* A transaction can change “n” different data objects. Each such object change
26+
is recorded in |XtraDB| using a so-call ``append_key`` operation. The
27+
``append_key`` operation registers the key of the data object that has
28+
undergone a change by the transaction. The key for rows can be represented in
29+
three parts as ``db_name``, ``table_name``, and ``pk_columns_for_table`` (if
30+
``pk`` is absent, a hash of the complete row is calculated). In short there
31+
is quick and short meta information that this transaction has
32+
touched/modified following rows. This information is passed on as part of the
33+
write-set for certification to all the nodes of a cluster while the
34+
transaction is in the commit phase.
35+
36+
* For a transaction to commit it has to pass XtraDB/Galera certification,
37+
ensuring that transactions don't conflict with any other changes posted on
38+
the cluster group/channel. Certification will add the keys modified by given
39+
the transaction to its own central certification vector (CCV), represented by
40+
``cert_index_ng``. If the said key is already part of the vector, then
41+
conflict resolution checks are triggered.
42+
43+
* Conflict resolution traces reference the transaction (that last modified
44+
this item in cluster group). If this reference transaction is from some other
45+
node, that suggests the same data was modified by the other node and changes
46+
of that node have been certified by the local node that is executing the
47+
check. In such cases, the transaction that arrived later fails to certify.
48+
49+
Changes made to DB objects are bin-logged. This is the same as how |MySQL|
50+
does it for replication with its Master-Slave ecosystem, except that a packet
51+
of changes from a given transaction is created and named as a write-set.
52+
53+
Once the client/user issues a ``COMMIT``, |Percona XtraDB Cluster| will run a
54+
commit hook. Commit hooks ensure following:
55+
56+
* Flush the binary logs.
57+
58+
* Check if the transaction needs replication (not needed for read-only
59+
transactions like ``SELECT``).
60+
61+
* If a transaction needs a replication, then it invokes a pre_commit hook in
62+
the Galera ecosystem. During this pre-commit hook, a write-set is written in
63+
the group channel by a “replicate” operation. All nodes (including the one
64+
that executed the transaction) subscribes to this group-channel and reads
65+
the write-set.
66+
67+
* ``gcs_recv_thread`` is first to receive the packet, which is then processed
68+
through different action handlers.
69+
70+
* Each packet read from the group-channel is assigned an ``id``, which is a
71+
locally maintained counter by each node in sync with the group. When any new
72+
node joins the group/cluster, a seed-id for it is initialized to the current
73+
active id from group/cluster. (There is an inherent assumption/protocol
74+
enforcement that all nodes read the packet from a channel in same order, and
75+
that way even though each packet doesn't carry ``id`` information it is
76+
inherently established using the local maintained ``id`` value).
77+
78+
.. code-block:: bash
79+
80+
/* Common situation -
81+
* increment and assign act_id only for totally ordered actions
82+
* and only in PRIM (skip messages while in state exchange) */
83+
rcvd->id = ++group->act_id_;
84+
85+
[This is an amazing way to solve the problem of the id co-ordination in
86+
multiple master system, otherwise a node will have to first get an id from
87+
central system or through a separate agreed protocol and then use it for the
88+
packet there-by doubling the round-trip time].
89+
90+
What happens if two nodes get ready with their packet at same time?
91+
92+
* Both nodes will be allowed to put the packet on the channel. That means the
93+
channel will see packets from different nodes queued one-behind-another.
94+
95+
* It is interesting to understand what happens if two nodes modify same set of
96+
rows. For example:
97+
98+
.. code-block:: bash
99+
100+
create -> insert (1,2,3,4)....nodes are in sync till this point.
101+
node-1: update i = i + 10;
102+
node-2: update i = i + 100;
103+
104+
Let's associate transaction-id (trx-id) for an update transaction that
105+
is executed on node-1 and node-2 in parallel (The real algorithm is bit
106+
more involved (with uuid + seqno) but conceptually the same so for ease
107+
we're using trx_id here)
108+
109+
node-1:
110+
update action: trx-id=n1x
111+
node-2:
112+
update action: trx-id=n2x
113+
114+
Both node packets are added to the channel but the transactions are
115+
conflicting. Let's see which one succeeds. The protocol says: FIRST WRITE WINS.
116+
So in this case, whoever is first to write to the channel will get certified.
117+
Let's say node-2 is first to write the packet and then node-1 makes
118+
immediately after it.
119+
120+
.. note::
121+
each node subscribes to all packages including its own package. See below
122+
for details.
123+
124+
Node-2:
125+
- Will see its own packet and will process it.
126+
- Then it will see node-1 packet that it tries to certify but fails.
127+
128+
Node-1:
129+
- Will see node-2 packet and will process it. (Note: InnoDB allows isolation
130+
and so node-1 can process node-2 packets independent of node-1 transaction
131+
changes)
132+
- Then it will see the node-1 packet that it tries to certify but fails.
133+
(Note even though the packet originated from node-1 it will under-go
134+
certification to catch cases like thes. This is beauty of listening to own
135+
events that make consistent processing path even if events are locally
136+
generated)
137+
138+
The certification protocol will be described using the example from above. As
139+
discussed above, the central certification vector (CCV) is updated to reflect
140+
reference transaction.
141+
142+
Node-2:
143+
- node-2 sees its own packet for certification, adds it to its local CCV and
144+
performs certification checks. Once these checks pass it updates the
145+
reference transaction by setting it to ``n2x``
146+
- node-2 then gets node-1 packet for certification. Said key is already
147+
present in CCV with a reference transaction set it to ``n2x``, whereas
148+
write-set proposes setting it to ``n1x``. This causes a conflict, which in
149+
turn causes the node-1 originated transaction to fail the certification
150+
test.
151+
152+
This helps point out a certification failure and the node-1 packet is rejected.
153+
154+
Node-1:
155+
- node-1 sees node-2 packet for certification, which is then processed, the
156+
local CCV is updated and the reference transaction is set to ``n2x``
157+
- Using the same case explained above, node-1 certification also rejects the
158+
node-1 packet.
159+
160+
This suggests that the node doesn't need to wait for certification to complete,
161+
but just needs to ensure that the packet is written to the channel. The applier
162+
transaction will always win and the local conflicting transaction will be
163+
rolled back.
164+
165+
What happens if one of the nodes has local changes that are not synced with
166+
group?
167+
168+
.. code-block:: bash
169+
170+
create (id primary key) -> insert (1), (2), (3), (4);
171+
node-1: wsrep_on=0; insert (5); wsrep_on=1
172+
node-2: insert(5).
173+
insert(5) will generate a write-set that will then be replicated to node-1.
174+
node-1 will try to apply it but will fail with duplicate-key-error, as 5
175+
already exist.
176+
177+
XtraDB will flag this as an error, which would eventually cause node-1 to
178+
shutdown.
179+
180+
With all that in place, how is GTID incremented if all the packets are
181+
processed by all nodes (including ones that are rejected due to certification)?
182+
GTID is incremented only when the transaction passes certification and is ready
183+
for commit. That way errant-packets don't cause GTID to increment. Also, they
184+
don't confuse the group packet ``id`` quoted above with GTID. Without
185+
errant-packets, you may end up seeing these two counters going hand-in-hand,
186+
but they are no way related.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
.. _gcache_record-set_cache_difference:
2+
3+
=========================================
4+
Understanding GCache and Record-Set Cache
5+
=========================================
6+
7+
In |Percona XtraDB Cluster| (PXC), there is a concept of GCache and Record-Set
8+
cache (which can also be called transaction write-set cache). The use of these
9+
two caches is often confusing if you are running long transactions, as both of
10+
them result in the creation of disk-level files. This manual describes what
11+
their main differences are.
12+
13+
Record-Set Cache
14+
================
15+
16+
When you run a long-running transaction on any particular node, it will try to
17+
append a key for each row that it tries to modify (the key is a unique
18+
identifier for the row ``{db,table,pk.columns}``). This information is cached
19+
in out-write-set, which is then sent to the group for certification.
20+
21+
To start with, keys are cached in HeapStore (which has ``page-size=65K`` and
22+
``total-size=4MB``). If the transaction data-size outgrows this limit, then the
23+
storage is switched from Heap to Page (which has a ``page-size=64MB`` and
24+
``total-limit=free-space-on-disk``). All these limits are non-configurable, but
25+
having a memory-page size greater than 4MB per transaction can cause things to
26+
stall due to memory pressure, so this limit is reasonable. (This is another
27+
limitation to address when Galera supports large transaction.)
28+
29+
The same long-running transaction will also generate binlog data that also
30+
appends to out-write-set on commit using the same technique explained above
31+
(``HeapStore->FileStore``). This data could be significant as it is a binlog
32+
image of rows inserted/updated/deleted by the transaction. Variable
33+
:variable:`wsrep_max_ws_size` controls the size of this part of the write set.
34+
(The threshold doesn't consider size allocated for caching-keys (above) and
35+
the header).
36+
37+
If ``FileStore`` is used, it creates a file on the disk (with names like
38+
``xxxx_keys`` and ``xxxx_data``) to store the cache data. These files are kept
39+
until a transaction is committed, so the lifetime of the transaction is linked.
40+
41+
When the node is done with the transaction and is about to commit, it will
42+
generate the final-write-set using the two files (if the data size grew enough
43+
to use ``FileStore``) plus ``HEADER``, and will publish it for certification to
44+
cluster.
45+
46+
The native node executing the transaction will also act as subscription node,
47+
and will receive its own write-set through the cluster publish mechanism. This
48+
time, the native node will try to cache write-set into its GCache. How much
49+
data GCache retains is controlled by the GCache configuration.
50+
51+
GCache
52+
======
53+
54+
GCache holds the write-set published on the cluster for replication.
55+
56+
The lifetime of write-set in GCache is not transaction linked.
57+
58+
When a ``JOINER`` node needs an IST, it will be serviced through this GCache
59+
(if possible).
60+
61+
GCache will also create the files to disk. (You can read more about it
62+
`here <http://severalnines.com/blog/understanding-gcache-galera>`_.)
63+
64+
Interestingly, at any given point in time, the native node has two copies of
65+
the write-set: one in GCache and other in Record-Set Cache.
66+
67+
For example:
68+
69+
If you ``INSERT/UPDATE`` 2M rows in a table with a schema like:
70+
71+
.. code-block:: mysql
72+
73+
(int, char(100), char(100) with pk (int, char(100))
74+
75+
and it created write-set key/data files in the background that looked like
76+
this:
77+
78+
.. code-block:: bash
79+
80+
-rw------- 1 xxx xxx 67108864 Apr 11 12:26 0x00000707_data.000000
81+
-rw------- 1 xxx xxx 67108864 Apr 11 12:26 0x00000707_data.000001
82+
-rw------- 1 xxx xxx 67108864 Apr 11 12:26 0x00000707_data.000002
83+
-rw------- 1 xxx xxx 67108864 Apr 11 12:26 0x00000707_keys.000000
84+

0 commit comments

Comments
 (0)