-
Notifications
You must be signed in to change notification settings - Fork 133
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Reimplement BRIN internals for AO/CO tables
Motivation: For AO/CO tables, we have the revmap explosion problem that the massive gaps in logical heap block numbers brought (across physical segment boundaries). The problem is articulated with an example in the README. Earlier, we solved this problem with the help of UPPER pages, which acted like a lookup table to find the revmap page, given a logical heap block number. One of the biggest shortcomings of the design was that even an empty BRIN index would take up ~3.2M at rest. This is because upper pages were always pre-allocated, to cover all possible heap block numbers. This space would be consumed on a per-segment basis, given GPDB's MPP nature. Further, for every operation involving the revmap, there was this 1 additional page always involved, which added to overhead. Highlights: (1) We removed the UPPER page design in a prior commit and now have replaced it with a chaining design. We completely break away from the restriction that the revmap pages follow one another right after the metapage, in contiguous block numbers. Instead, we now have them point to one another in a singly linked list. Furthermore, there are up to MAX_AOREL_CONCURRENCY such linked lists of revmap pages. There is one list per block sequence. The heads and tails of these lists(or chains) are maintained in the metapage (and cached in the revmap access struct). Since revmap pages are no longer contiguous for AO/CO tables, we have to additionally maintain logical page numbers (in the BrinSpecialSpace) for all revmap pages (depicted in the diagram above). These logical page numbers are used for both iterating over the revmap during scans and also while extending the revmap. We traverse these lists in order within a block sequence and block sequence by block sequence. We never have to lock more than 1 revmap page at a time during chain traversal. Only for revmap extension, do we have to lock two revmap pages: the last revmap page in the chain and the new revmap page being added. For operations such as insert, we make use of the chain tail pointer in the metapage. Due to the appendonly nature of AO/CO tables, we would always write to the last logical heap block within a block sequence. Thus, unlike for heap, blocks other than the last block would never be summarized as a result of an insert. So, we can safely position the revmap iterator at the end of the chain(instead of traversing the chain unnecessarily from the front). (2) pageinspect and waldump have been modified in accordance with these changes. (3) Whitebox tests have been added for all BRIN operations, with the exception of desummarize. These tests utilize pageinspect. (4) WAL changes: Catalog bump is performed as we can't change XLOG_PAGE_MAGIC, in order to avoid future merge conflicts. (5) Created 202_wal_consistency_brin.pl under src/test/recovery as a replica of src/test/modules/brin/t/02_wal_consistency.pl, with added tests for AO/CO tables (since src/test/modules is excluded from CI) Note: Please refer to the updated README for more details.
- Loading branch information
1 parent
f9455b1
commit d06063d
Showing
28 changed files
with
1,843 additions
and
854 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.