-
Notifications
You must be signed in to change notification settings - Fork 894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vectorized aggregation with grouping by one fixed-size column #7341
Open
akuzm
wants to merge
66
commits into
timescale:main
Choose a base branch
from
akuzm:hash-simple
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
66 commits
Select commit
Hold shift + click to select a range
b92e622
Vectorized hash grouping on one column
akuzm 4ce0e99
Merge remote-tracking branch 'origin/main' into HEAD
akuzm 74d4419
benchmark vectorized grouping (2024-10-02 no. 6)
akuzm baedf7f
fixes
akuzm 35dbd36
benchmark vectorized grouping (2024-10-02 no. 7)
akuzm 74fffd3
some ugly stuff
akuzm f8db454
benchmark vectorized grouping (2024-10-02 no. 9)
akuzm 00a9d11
someething
akuzm 339f91a
reduce indirections
akuzm f075589
skip null bitmap words
akuzm 88f325d
cleanup
akuzm 15ab443
crc32
akuzm ff16ec8
license
akuzm 4291b17
benchmark vectorized hash grouping (2024-10-09 no. 10)
akuzm 795ef6b
test deltadelta changes
akuzm 1fabb22
some speedups and simplehash simplifications
akuzm 717abc4
Revert "test deltadelta changes"
akuzm b03bd6b
test deltadelta changes
akuzm 166d0e8
work with signed types
akuzm 7f578b4
Revert "work with signed types"
akuzm e70cb0b
bulk stuff specialized to element type
akuzm 0040844
roll back the delta delta stuff
akuzm 694faf6
use simplehash
akuzm 3d05674
cleanup
akuzm d90a90f
benchmark vectorized hash grouping (simple) (2024-10-14 no. 11)
akuzm 4a93549
add more tests
akuzm 3e06b92
remove modified simplehash
akuzm a7942ed
offsets
akuzm 6fb517f
cleanup
akuzm ffb28cf
changelog
akuzm 778ca97
cleanup
akuzm ef3847a
benchmark vectorized hash grouping (simple) (2024-10-15 no. 12)
akuzm 1409c74
32-bit
akuzm 514ae96
some renames
akuzm 22d23b3
cleanup
akuzm cd7a1dc
spelling
akuzm 9ebd61f
Merge remote-tracking branch 'origin/main' into HEAD
akuzm 9e51c19
Vectorize aggregate FILTER clause
akuzm 480d0fe
Merge remote-tracking branch 'origin/main' into HEAD
akuzm 9b0ee38
cleanups after merge
akuzm effa7eb
cleanup
akuzm 533be01
Merge remote-tracking branch 'origin/main' into HEAD
akuzm 8e6c6d2
changelog
akuzm b717f74
constify stable expressions
akuzm 4df06d9
Merge commit '155ca6f7ef2925735c7063cd9178edd185c17009' into HEAD
akuzm 47bcaa9
updates
akuzm b6cee02
remove extras
akuzm ecb1aec
ref
akuzm f64676f
fixes
akuzm fab11fb
benchmark single fixed-column hash grouping (2024-12-03 no. 11)
akuzm dff6dff
cleanup
akuzm 831cadd
planning fixes for pg 17
akuzm 66403f2
benchmark fixed-size hash grouping (2024-12-04 no. 152)
akuzm 99e5b04
remove some (yet) unused code
akuzm de22a22
Merge remote-tracking branch 'origin/main' into HEAD
akuzm 9fccab9
ref
akuzm 8e97c2f
Merge remote-tracking branch 'akuzm/vector-filter' into HEAD
akuzm f5b648a
add test
akuzm ecd9cb2
Merge remote-tracking branch 'origin/main' into HEAD
akuzm dc6001d
typo
akuzm 0ea397a
disable parallel
akuzm ea4dab1
add order
akuzm 4b98e46
Update tsl/src/nodes/vector_agg/grouping_policy_hash.h
akuzm b615dbe
Update tsl/src/nodes/vector_agg/grouping_policy_hash.h
akuzm 045f59a
determine the grouping type at plan time
akuzm 10e66ad
Merge remote-tracking branch 'origin/main' into HEAD
akuzm File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
Implements: #7341 Vectorized aggregation with grouping by one fixed-size by-value compressed column (such as arithmetic types). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,8 @@ | ||
add_subdirectory(function) | ||
add_subdirectory(hashing) | ||
set(SOURCES | ||
${CMAKE_CURRENT_SOURCE_DIR}/exec.c | ||
${CMAKE_CURRENT_SOURCE_DIR}/grouping_policy_batch.c | ||
${CMAKE_CURRENT_SOURCE_DIR}/grouping_policy_hash.c | ||
${CMAKE_CURRENT_SOURCE_DIR}/plan.c) | ||
target_sources(${TSL_LIBRARY_NAME} PRIVATE ${SOURCES}) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
58 changes: 58 additions & 0 deletions
58
tsl/src/nodes/vector_agg/function/agg_many_vector_helper.c
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
/* | ||
* This file and its contents are licensed under the Timescale License. | ||
* Please see the included NOTICE for copyright information and | ||
* LICENSE-TIMESCALE for a copy of the license. | ||
*/ | ||
|
||
/* | ||
* A generic implementation of adding the given batch to many aggregate function | ||
* states with given offsets. Used for hash aggregation, and builds on the | ||
* FUNCTION_NAME(one) function, which adds one passing non-null row to the given | ||
* aggregate function state. | ||
*/ | ||
static pg_attribute_always_inline void | ||
FUNCTION_NAME(many_vector_impl)(void *restrict agg_states, const uint32 *offsets, | ||
const uint64 *filter, int start_row, int end_row, | ||
const ArrowArray *vector, MemoryContext agg_extra_mctx) | ||
{ | ||
FUNCTION_NAME(state) *restrict states = (FUNCTION_NAME(state) *) agg_states; | ||
const CTYPE *values = vector->buffers[1]; | ||
MemoryContext old = MemoryContextSwitchTo(agg_extra_mctx); | ||
for (int row = start_row; row < end_row; row++) | ||
{ | ||
const CTYPE value = values[row]; | ||
FUNCTION_NAME(state) *restrict state = &states[offsets[row]]; | ||
if (arrow_row_is_valid(filter, row)) | ||
{ | ||
Assert(offsets[row] != 0); | ||
FUNCTION_NAME(one)(state, value); | ||
} | ||
} | ||
MemoryContextSwitchTo(old); | ||
} | ||
|
||
static pg_noinline void | ||
FUNCTION_NAME(many_vector_all_valid)(void *restrict agg_states, const uint32 *offsets, | ||
int start_row, int end_row, const ArrowArray *vector, | ||
MemoryContext agg_extra_mctx) | ||
{ | ||
FUNCTION_NAME(many_vector_impl) | ||
(agg_states, offsets, NULL, start_row, end_row, vector, agg_extra_mctx); | ||
} | ||
|
||
static void | ||
FUNCTION_NAME(many_vector)(void *restrict agg_states, const uint32 *offsets, const uint64 *filter, | ||
int start_row, int end_row, const ArrowArray *vector, | ||
MemoryContext agg_extra_mctx) | ||
{ | ||
if (filter == NULL) | ||
{ | ||
FUNCTION_NAME(many_vector_all_valid) | ||
(agg_states, offsets, start_row, end_row, vector, agg_extra_mctx); | ||
} | ||
else | ||
{ | ||
FUNCTION_NAME(many_vector_impl) | ||
(agg_states, offsets, filter, start_row, end_row, vector, agg_extra_mctx); | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just double-checking that it is really correct to be non-inclusive with the
end_row
here...end_row
sounds like it is a "valid" row index as opposed to using something likenum_rows
in a zero-indexed series. Ifend_row
is not a valid index it should probably be called something else.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that's the C++ habit of mine where
end
is idiomatically a past-the-end invalid iterator. In general I think the ranges with exclusive right end are very common, like[begin, end)
. Do you have a better name for this? Sometimes I writepast_the_end_row
to make it absolutely clear, but this feels a little too long for common usage...There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, so what I suggested is
num_rows
, which has clear semantics. Then you can have a "standard" for-loop on i (starting at 0) and just get the row from start_row + i.Not super-important, but the semantics are a bit more clear. Feel free to decide yourself what works best....