-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MDEV-35151 DDL recovery for high-level indexes #3520
MDEV-35151 DDL recovery for high-level indexes #3520
Conversation
|
file->ha_rename_table(idx_from, idx_to); | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is a good approach. It feels like you put too much inside one DDL_RENAME_PHASE_TABLE
entry. There can be crashes between renames of individual vector indexes (presuming there will be many some day).
Perhaps, DDL_RENAME_PHASE_TABLE
should be one ha_rename_table
only? That is, a table with hlindexes would generate one DDL_RENAME_PHASE_TABLE
entry per index (and one for the main table, of course)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for DROP, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There can be crashes between renames of individual vector indexes (presuming there will be many some day).
Yes, sure. And my code handles it inline with what original code have been doing, that is ignores non_existing_table_errors()
, e.g. if handler files were moved and .frm not. Or as you suggested above crash between different high-level indexes rename.
... a table with hlindexes would generate one DDL_RENAME_PHASE_TABLE entry per index (and one for the main table, of course)?
PHASE
is just anenum
member of anACTION
, it can't have stuff like table name or whatever attached.ACTION
is more logical (or high-level) entity, which may do many things like binlogging, handling stat tables, triggers, handler files and of course .frm.
Here're options:
- We could add special phases to handle high-level indexes, but there's going to be no much difference between my solution. And it will introduce compatibility issues. Though it is very true that older versions won't be able to handle high-level indexes properly even with this patch.
- We can't re-use existing actions like
DDL_LOG_RENAME_TABLE_ACTION
, as I mentioned before they're way to sophisticated for high-level indexes. - In theory we could try re-using special actions like
DDL_LOG_RENAME_ACTION
for high level indexes. That is issue oneDDL_LOG_RENAME_TABLE_ACTION
and thenDDL_LOG_RENAME_ACTION
for each high-level index. It may work, it has no compatibility concerns, but it is a question if we're going to be able to integrate it with e.g.ALTER
.
Your thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I'm much of an expert in ddl logging. Let's use your approach if it works.
file->ha_rename_table(idx_from, idx_to); | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for DROP, etc.
658f587
to
70ff3b6
Compare
e231991
to
4557941
Compare
could you please rebase it on top of the latest bb-11.6-MDEV-32887-vector ? It's impossible to review now, it looks like it includes basically everything |
cadf2be
to
fe9092a
Compare
3d2f51e
to
5bea61a
Compare
1d2a5fc
to
b29dca9
Compare
a8a5206
to
8bb7868
Compare
…e upon using non-default table options and system variables extend the option_list expicitly on CREATE/ALTER, not implicitly on parsing.
…th vector key and unique key InnoDB rename needs the same workaround for hlindexes as it has for partitions
… DISTINCT don't apply distinct optimization to order by a vector index
When using the default innodb_log_buffer_size=2m, mariadb-backup --backup would spend a lot of time re-reading and re-parsing the log. For reads, it would be beneficial to memory-map the entire ib_logfile0 to the address space (typically 48 bits or 256 TiB) and read it from there, both during --backup and --prepare. We will introduce the Boolean read-only parameter innodb_log_file_mmap that will be OFF by default on most platforms, to avoid aggressive read-ahead of the entire ib_logfile0 in when only a tiny portion would be accessed. On Linux and FreeBSD the default is innodb_log_file_mmap=ON, because those platforms define a specific mmap(2) option for enabling such read-ahead and therefore it can be assumed that the default would be on-demand paging. This parameter will only have impact on the initial InnoDB startup and recovery. Any writes to the log will use regular I/O, except when the ib_logfile0 is stored in a specially configured file system that is backed by persistent memory (Linux "mount -o dax"). We also experimented with allowing writes of the ib_logfile0 via a memory mapping and decided against it. A fundamental problem would be unnecessary read-before-write in case of a major page fault, that is, when a new, not yet cached, virtual memory page in the circular ib_logfile0 is being written to. There appears to be no way to tell the operating system that we do not care about the previous contents of the page, or that the page fault handler should just zero it out. Many references to HAVE_PMEM have been replaced with references to HAVE_INNODB_MMAP. The predicate log_sys.is_pmem() has been replaced with log_sys.is_mmap() && !log_sys.is_opened(). Memory-mapped regular files differ from MAP_SYNC (PMEM) mappings in the way that an open file handle to ib_logfile0 will be retained. In both code paths, log_sys.is_mmap() will hold. Holding a file handle open will allow log_t::clear_mmap() to disable the interface with fewer operations. It should be noted that ever since commit 685d958 (MDEV-14425) most 64-bit Linux platforms on our CI platforms (s390x a.k.a. IBM System Z being a notable exception) read and write /dev/shm/*/ib_logfile0 via a memory mapping, pretending that it is persistent memory (mount -o dax). So, the memory mapping based log parsing that this change is enabling by default on Linux and FreeBSD has already been extensively tested on Linux. ::log_mmap(): If a log cannot be opened as PMEM and the desired access is read-only, try to open a read-only memory mapping. xtrabackup_copy_mmap_snippet(), xtrabackup_copy_mmap_logfile(): Copy the InnoDB log in mariadb-backup --backup from a memory mapped file.
…E with a wrong data
… area assertion failures upon EITS collection with vector type don't collect histograms for vector columns
ONLINE ALTER didn't expect XA PREPARE to fail. Mark rollback on failed prepare with the XA_ROLLBACK_ONLY state, detect that in ONLINE ALTER
…NCT with vector type add MYSQL_TYPE_VECTOR to an if() over field->type() values
This reverts commit 79b7f89.
This reverts commit db658f9.
… area assertion failures upon EITS collection with vector type
4b57276
to
4dfa605
Compare
… DISTINCT MariaDB#2 mhnsw index can be used only for ORDER BY. not for GROUP BY
…_key_to_innobase upon query from table with usual key on vector field add test
with streaming implemened mhnsw no longer needs to know the LIMIT in advance. let's just cap it to avoid allocating too much memory for the one step result set
…ry from empty table
…s and VEC_ToText accepts let VEC_FromText validate that the vector l2squared isn't NaN. VEC_ToText still prints everything.
… on table with vector field test case
…ue of mhnsw_min_limit mhnsw_min_limit must not be larger than candidates queue size
…OMTEXT with an invalid argument
Since high-level index tables do not participate in thr_multi_lock(), added explicit call to THR_LOCK::start_trans(). This is needed mostly for Aria to handle transaction logging.
5bea61a
to
3869f7a
Compare
1b6db26
to
62afa2f
Compare
cherry-picked into the vector branch, thanks! closing. |
Description
Various fixes to make DDL recovery work with high-level indexes.
Release Notes
None.
How can this PR be tested?
mtr
Basing the PR against the correct MariaDB version
main
branch.PR quality check