Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ao related commits cherry-pick #884

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Conversation

reshke
Copy link
Contributor

@reshke reshke commented Jan 21, 2025

Fixes #ISSUE_Number

What does this PR do?

Type of Change

  • Bug fix (non-breaking change)
  • New feature (non-breaking change)
  • Breaking change (fix or feature with breaking changes)
  • Documentation update

Breaking Changes

Test Plan

  • Unit tests added/updated
  • Integration tests added/updated
  • Passed make installcheck
  • Passed make -C src/test installcheck-cbdb-parallel

Impact

Performance:

User-facing changes:

Dependencies:

Checklist

Additional Context

CI Skip Instructions


divyeshddv and others added 5 commits January 21, 2025 19:19
This change is an extension of the PR #13246.
The commit c121bcd83c9 fixes the bug for
ALTER TABLE EXPAND TABLE and ALTER TABLE SET DISTRIBUTED BY
operations on AO tables with indexes.

These operations on AO tables with indexes are executed with CT + IIS
whereas on AO tables without indexes which are executed with CTAS

The bug about not preserving storage options is also present
for ALTER TABLE on AO tables without indexes case.
This change fixes that bug by adding the storage options preservation
logic to AT calls for AO tables which go through an internal CTAS.

Co-authored-by: Soumyadeep Chakraborty <[email protected]>
follow up #13213 and #13221.

dispatch temp namespace oid in every query. on writer gang skips set
temp namespace oid, the writer gang will do InitTempTableNamespace().
on reader gang always use the oid from query message, allow temp NS
oid change dynamically.

temp namespace oid will stale after creating temp NS rollback. it
should be fine, new oid will be dispatched in the next sql.
Since this PR #11377 has fixed bitmap index scan when concurrent insert update full
bitmap page, which resolved concurrent read on bitmap index when there's insert running
in the backend may cause the bitmap scan read wrong tid:

1. Query on bitmap: A query starts and reads all bitmap pages to PAGE_FULL, increases
the next tid to fetch, and releases the lock after reading each page.
2. Concurrent insert: insert a tid into PAGE_FULL cause expand compressed words to
new words, and rearrange words into PAGE_NEXT.
3. Query on bitmap: fetch PAGE_NEXT and expect the first tid in it should equal the
saved next tid. But actually PAGE_NEXT now contains words used to belong in PAGE_FULL.
This causes the real next tid less than the expected next tid. But our scan keeps increasing
the wrong tid. And then this leads to a wrong result.

The PR used _bitmap_catchup_to_next_tid() function to adjust result->lastScanWordNo
to the correct position if there has concurrent read/write and causes rearranging words into the
next bitmap index page, it used the value of words->firstTid and result->nextTid to judge
whether rearranging words into the next bitmap index page happened or not, this is not entirely
right.

This is because BMIterateResult can only store 16*1024=16384 TIDs, but BMBatchWords
can save 3968 words by default, a words is 64bit, even in the worst case (all words not
compressed) , BMBatchWords can hold 3968 * 64 = 253952 TIDs. So if there has a PAGE_FULL,
the PAGE_FULL must scan it more than one time, but the value of BMBatchWords->firstTid will
not be updated during each scan, it will only be updated when new bitmap index pages are read.

Therefore, in the absence of concurrent read/write, if we need to scan the same BMBatchWords
multiple times, it will lead to the wrong scan position, resulting in wrong output results, as #13446.

To summarize, we just need to check for a rearranged condition when the new bitmap index page is
read from disk, we should do nothing when we scan the same BMBatchWords.
The case prevent_ao_wal will check XLOG entries and before the
check it invokes VACUUM without table names so this vacuum will
operates on tables in the whole database. If previous cases leave
some Append Only Tables, then there might be more XLOG entries
to make the case failure.

This commit fixes the issue by adding table names in the case.
In commit a0684da we made a change to use the fixed TOAST_TUPLE_TARGET
instead of custom toast_tuple_target that's calculated based on the the
blocksize of the AO table for toasting AO table tuples.

This caused an issue that some tuples that are well beyond the
toast_tuple_target are not being toasted (because they're smaller than
TOAST_TUPLE_TARGET). This is ok when the tuples can still fit into the
AO table's blocksize. But if not, an error would occur. E.g.:

  postgres=# create table t (a int, b int[]) WITH (appendonly=true, blocksize=8192);
  CREATE TABLE
  postgres=# insert into t select 1, array_agg(x) from generate_series(1, 2030) x;
  ERROR:  item too long (check #1): length 8160, maxBufferLen 8168
  CONTEXT:  Append-Only table 't'

Therefore, we should revert those changes, including:

1. Still use the toast_tuple_target for toasting 'x' and 'e' columns for AO tables.
There's also a GPDB_12_MERGE_FIXME in the original comment suggesting to use
RelationGetToastTupleTarget for AO table (in order to unify the logic to calculate
maxDataLen for heap and AO tuples, I suppose). But that's not possible currently
because the calculated toast_tuple_target is stored in AppendOnlyInsertDesc and
we cannot get it from the Relation struct. And it seems to me that we won't have
a way to do that easily. Therefore, keep this FIXME comment being removed.

2. Still use the toast_tuple_target for toasting 'm' columns for AO tables. Also restore
the FIXME comment suggesting that we should use a bigger target size for 'm' columns.
This should be something that worth investigating more into.

Commit a0684da also includes a change to use custom toast_tuple_target for heap tables,
which should be a valid change. I think that fixed an oversight when merging the upstream
commit c251336.
@reshke reshke added the cherry-pick cherry-pick upstream commts label Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-pick cherry-pick upstream commts
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants