Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cherry-pick series of BRIN + AO commits until 63d916b0c3 #859

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

reshke
Copy link
Contributor

@reshke reshke commented Jan 10, 2025

greenplum-db/gpdb-archive@63d916b
Fixes #ISSUE_Number

What does this PR do?

Type of Change

  • Bug fix (non-breaking change)
  • New feature (non-breaking change)
  • Breaking change (fix or feature with breaking changes)
  • Documentation update

Breaking Changes

Test Plan

  • Unit tests added/updated
  • Integration tests added/updated
  • Passed make installcheck
  • Passed make -C src/test installcheck-cbdb-parallel

Impact

Performance:

User-facing changes:

Dependencies:

Checklist

Additional Context

CI Skip Instructions


@reshke reshke changed the title Cherry-pick series of BRIN + AO commits strictly before 63d916b0c3 Cherry-pick series of BRIN + AO commits until 63d916b0c3 Jan 11, 2025
@my-ship-it my-ship-it added the cherry-pick cherry-pick upstream commts label Jan 13, 2025
@yjhjstz
Copy link
Member

yjhjstz commented Jan 20, 2025

--- /__w/cloudberry/cloudberry/src/test/isolation2/expected/uao/brin_chain_row.out	2025-01-11 02:22:52.954141255 -0800
+++ /__w/cloudberry/cloudberry/src/test/isolation2/results/uao/brin_chain_row.out	2025-01-11 02:22:52.962141243 -0800
@@ -21,7 +21,9 @@
 INSERT INTO brin_chain_ao_row SELECT '2' FROM generate_series(1, 1000);
 INSERT 1000
 ALTER TABLE brin_chain_ao_row SET ACCESS METHOD ao_row;
-ALTER
+ERROR:  syntax error at or near "ACCESS"
+LINE 1: ALTER TABLE brin_chain_ao_row SET ACCESS METHOD ao_row;

related test need to fix.

@reshke
Copy link
Contributor Author

reshke commented Jan 20, 2025

--- /__w/cloudberry/cloudberry/src/test/isolation2/expected/uao/brin_chain_row.out	2025-01-11 02:22:52.954141255 -0800
+++ /__w/cloudberry/cloudberry/src/test/isolation2/results/uao/brin_chain_row.out	2025-01-11 02:22:52.962141243 -0800
@@ -21,7 +21,9 @@
 INSERT INTO brin_chain_ao_row SELECT '2' FROM generate_series(1, 1000);
 INSERT 1000
 ALTER TABLE brin_chain_ao_row SET ACCESS METHOD ao_row;
-ALTER
+ERROR:  syntax error at or near "ACCESS"
+LINE 1: ALTER TABLE brin_chain_ao_row SET ACCESS METHOD ao_row;

related test need to fix.

will do after #871 merge

We don't have autovacuum on user tables enabled yet to test this. So
disable it for now.
We were not correctly bounding the scan (ignoring block sequences).
Tests will be added in a later commit.
Correct the range limits in the comments for the partial range test.
This commit removes the upper page architecture for AO/CO tables. This
is in preparation of a rewrite of BRIN for AO/CO tables.

The objective of this commit is to align the existing BRIN code as much
as possible with upstream, and to ensure that tests with BRIN indexes on
heap tables pass. Existing AO/CO tests will fail.

Key points:

(1) Reverted revmap_physical_extend to not returning a BlockNumber. It
was a dead return anyway.

(2) Reverted skipExtend parameter for brin_doupdate(). Its not clear why
this was necessary. It shouldn't be necessary for heap tables.

(3) Kept isAo parameter for metapage and xlog.
Motivation:

For AO/CO tables, we have the revmap explosion problem that the massive
gaps in logical heap block numbers brought (across physical segment
boundaries). The problem is articulated with an example in the README.
Earlier, we solved this problem with the help of UPPER pages, which
acted like a lookup table to find the revmap page, given a logical heap
block number.

One of the biggest shortcomings of the design was that even an empty
BRIN index would take up ~3.2M at rest. This is because upper pages were
always pre-allocated, to cover all possible heap block numbers. This
space would be consumed on a per-segment basis, given GPDB's MPP nature.

Further, for every operation involving the revmap, there was this 1
additional page always involved, which added to overhead.

Highlights:

(1) We removed the UPPER page design in a prior commit and now have
replaced it with a chaining design.

We completely break away from the restriction that the revmap pages
follow one another right after the metapage, in contiguous block
numbers. Instead, we now have them point to one another in a singly
linked list.

Furthermore, there are up to MAX_AOREL_CONCURRENCY such linked lists of
revmap pages. There is one list per block sequence. The heads and tails
of these lists(or chains) are maintained in the metapage (and cached in
the revmap access struct).

Since revmap pages are no longer contiguous for AO/CO tables, we have to
additionally maintain logical page numbers (in the BrinSpecialSpace)
for all revmap pages (depicted in the diagram above). These logical
page numbers are used for both iterating over the revmap during scans
and also while extending the revmap.

We traverse these lists in order within a block sequence and block
sequence by block sequence.

We never have to lock more than 1 revmap page at a time during chain
traversal. Only for revmap extension, do we have to lock two revmap
pages: the last revmap page in the chain and the new revmap page being
added.

For operations such as insert, we make use of the chain tail pointer in
the metapage. Due to the appendonly nature of AO/CO tables, we would
always write to the last logical heap block within a block sequence.
Thus, unlike for heap, blocks other than the last block would never be
summarized as a result of an insert. So, we can safely position the
revmap iterator at the end of the chain(instead of traversing the chain
unnecessarily from the front).

(2) pageinspect and waldump have been modified in accordance with these
changes.

(3) Whitebox tests have been added for all BRIN operations, with the
exception of desummarize. These tests utilize pageinspect.

(4) WAL changes: Catalog bump is performed as we can't change
XLOG_PAGE_MAGIC, in order to avoid future merge conflicts.

(5) Created 202_wal_consistency_brin.pl under src/test/recovery as a
replica of src/test/modules/brin/t/02_wal_consistency.pl, with added
tests for AO/CO tables (since src/test/modules is excluded from CI)

Note: Please refer to the updated README for more details.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-pick cherry-pick upstream commts type: catalog_change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants