Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[enhancement]Add comments to block.h and add unit tests #44083

Open
wants to merge 42 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
dd0173c
add comments for block.h
Yoruet Nov 15, 2024
e961aaf
add more comments to block.h and add some unit tests for block in blo…
Yoruet Nov 17, 2024
6a04516
code format
Yoruet Nov 17, 2024
05f564f
code format
Yoruet Nov 17, 2024
c73bc01
code format
Yoruet Nov 17, 2024
658cf55
code format
Yoruet Nov 17, 2024
121304d
code format
Yoruet Nov 17, 2024
99c1eae
fix bug in filterandselector
Yoruet Nov 17, 2024
173aeb9
code format
Yoruet Nov 17, 2024
8b8728d
update the comments of the block.h
Yoruet Nov 19, 2024
cf8ab08
add corner case for BlockTest.Constructor
Yoruet Nov 25, 2024
cf88247
add corner cases for BlockTest.BasicOperations
Yoruet Nov 25, 2024
a44abe1
Refactor and format block_test.cpp for improved readability and consi…
Yoruet Nov 25, 2024
77694e3
Remove unnecessary whitespace in block_test.cpp to enhance code clarity
Yoruet Nov 25, 2024
07712ac
Add death tests for invalid erase operations in BlockTest
Yoruet Nov 26, 2024
a717ea7
Enhance BlockTest with comprehensive tests for empty, const, and null…
Yoruet Nov 26, 2024
26b9f3f
Refactor whitespace in block_test.cpp to improve code readability and…
Yoruet Nov 26, 2024
dc1e232
Remove unnecessary whitespace in block_test.cpp to enhance code clari…
Yoruet Nov 26, 2024
0b83471
Enhance BlockTest with additional tests for row operations, including…
Yoruet Nov 26, 2024
a75537e
Enhance BlockTest with comprehensive tests for empty, const, nullable…
Yoruet Nov 29, 2024
8c9e988
Enhance BlockTest with extensive tests for empty, const, nullable, an…
Yoruet Nov 29, 2024
fb88229
Refactor whitespace in block_test.cpp to improve readability and main…
Yoruet Nov 29, 2024
a60a460
Enhance BlockTest with extensive tests for empty, regular, const, and…
Yoruet Nov 29, 2024
ee81171
Enhance BlockTest with comprehensive tests for filtering operations o…
Yoruet Dec 2, 2024
e17870e
Enhance BlockTest with extensive tests for row operations on empty, r…
Yoruet Dec 2, 2024
2899971
Enhance BlockTest with extensive tests for clearing column data acros…
Yoruet Dec 8, 2024
ff6abd9
Enhance BlockTest with comprehensive tests for index operations on em…
Yoruet Dec 17, 2024
628726f
Remove obsolete test for ReplaceIfOverflow from BlockTest to streamli…
Yoruet Dec 17, 2024
c30bc59
Enhance BlockTest with comprehensive tests for shuffling operations o…
Yoruet Dec 17, 2024
72569f3
Enhance BlockTest with comprehensive tests for hash updates on empty,…
Yoruet Dec 17, 2024
11263c3
Enhance BlockTest with comprehensive tests for erase operations, comp…
Yoruet Dec 18, 2024
cb87c0b
Enhance BlockTest with comprehensive tests for string operations, inc…
Yoruet Dec 18, 2024
00f7d1a
code format
Yoruet Dec 18, 2024
ab454cd
code format
Yoruet Dec 18, 2024
9ec8a8c
code format
Yoruet Dec 18, 2024
037e553
code format
Yoruet Dec 18, 2024
84a859a
code format
Yoruet Dec 18, 2024
48076f6
code format
Yoruet Dec 18, 2024
52e7a7b
code format
Yoruet Dec 18, 2024
47ef29c
code format
Yoruet Dec 18, 2024
1ce54ee
Fix BlockTest assertion to check for non-null ColumnConst in sorted b…
Yoruet Dec 18, 2024
11874f9
Merge branch 'master' into block
Yoruet Feb 3, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 69 additions & 3 deletions be/src/vec/core/block.h
Original file line number Diff line number Diff line change
Expand Up @@ -95,8 +95,9 @@ class Block {
Block(Block&& block) = default;
Block& operator=(Block&& other) = default;

/// Reserve memory for internal containers
void reserve(size_t count);
// Make sure the nammes is useless when use block
/// Clear all column names and name index mappings in the block
Yoruet marked this conversation as resolved.
Show resolved Hide resolved
void clear_names();

/// insert the column at the specified position
Expand All @@ -123,6 +124,7 @@ class Block {
std::swap(data, new_data);
}

/// Initialize the index by name map
void initialize_index_by_name();

/// References are invalidated after calling functions above.
Expand All @@ -133,51 +135,69 @@ class Block {
}
const ColumnWithTypeAndName& get_by_position(size_t position) const { return data[position]; }

/// Replace column at position with rvalue column pointer
void replace_by_position(size_t position, ColumnPtr&& res) {
this->get_by_position(position).column = std::move(res);
}

/// Replace column at position with lvalue column pointer
void replace_by_position(size_t position, const ColumnPtr& res) {
this->get_by_position(position).column = res;
}

/// Convert const column at position to full column if it is const
void replace_by_position_if_const(size_t position) {
auto& element = this->get_by_position(position);
element.column = element.column->convert_to_full_column_if_const();
}

/// Convert all columns to new columns if they overflow
void replace_if_overflow() {
for (auto& ele : data) {
ele.column = std::move(*ele.column).mutate()->convert_column_if_overflow();
}
}

// get column by position, throw exception when position is invalid
ColumnWithTypeAndName& safe_get_by_position(size_t position);
const ColumnWithTypeAndName& safe_get_by_position(size_t position) const;

// get column by name, throw exception when no such column name
ColumnWithTypeAndName& get_by_name(const std::string& name);
const ColumnWithTypeAndName& get_by_name(const std::string& name) const;

// return nullptr when no such column name
ColumnWithTypeAndName* try_get_by_name(const std::string& name);
const ColumnWithTypeAndName* try_get_by_name(const std::string& name) const;

/// Get an iterator to the beginning of the data container
Container::iterator begin() { return data.begin(); }
/// Get an iterator to the end of the data container
Container::iterator end() { return data.end(); }
/// Get a constant iterator to the beginning of the data container
Container::const_iterator begin() const { return data.begin(); }
/// Get a constant iterator to the end of the data container
Container::const_iterator end() const { return data.end(); }
/// Get a constant iterator to the beginning of the data container
Container::const_iterator cbegin() const { return data.cbegin(); }
/// Get a constant iterator to the end of the data container
Container::const_iterator cend() const { return data.cend(); }

// check if the column name exists
bool has(const std::string& name) const;

// get the position of the column by name
size_t get_position_by_name(const std::string& name) const;

// get the columns with type and name
const ColumnsWithTypeAndName& get_columns_with_type_and_name() const;

// Returns a vector containing all column names in the block
std::vector<std::string> get_names() const;
// Returns a vector containing all column data types in the block
DataTypes get_data_types() const;

// Returns the data type of the column at the specified index
DataTypePtr get_data_type(size_t index) const {
CHECK(index < data.size());
return data[index].type;
Expand All @@ -186,6 +206,8 @@ class Block {
/// Returns number of rows from first column in block, not equal to nullptr. If no columns, returns 0.
size_t rows() const;

// Returns a string showing the size of each column, separated by ' | '
// Returns -1 for null columns
std::string each_col_size() const;

// Cut the rows in block, use in LIMIT operation
Expand All @@ -204,6 +226,7 @@ class Block {
/// Approximate number of bytes in memory - for profiling and limits.
size_t bytes() const;

/// Get a string with the size of each column in bytes.
std::string columns_bytes() const;

/// Approximate number of allocated bytes in memory - for profiling and limits.
Expand All @@ -212,6 +235,7 @@ class Block {
/** Get a list of column names separated by commas. */
std::string dump_names() const;

/** Get a list of column types separated by commas. */
std::string dump_types() const;

/** List of names, types and lengths of columns. Designed for debugging. */
Expand All @@ -220,11 +244,16 @@ class Block {
/** Get the same block, but empty. */
Block clone_empty() const;

/// Returns a copy of all columns, converting const columns to full columns
Columns get_columns() const;
/// Returns all columns and converts const columns to full columns in place
Columns get_columns_and_convert();

/// Set the columns of the block.
void set_columns(const Columns& columns);
/// Clone the block with the specified columns.
Block clone_with_columns(const Columns& columns) const;
/// Clone the block with the specified column offset but without data.
Block clone_without_columns(const std::vector<int>* column_offset = nullptr) const;

/** Get empty columns with the same types as in block. */
Expand All @@ -251,10 +280,14 @@ class Block {
// Else clear column [0, column_size) delete column [column_size, data.size)
void clear_column_data(int64_t column_size = -1) noexcept;

// Check if the block is not empty.
bool mem_reuse() { return !data.empty(); }

// Check if the block has no columns
bool is_empty_column() { return data.empty(); }

// Check if the block has no rows (i.e. all columns have 0 rows)
// This is different from is_empty_column() which checks for absence of columns
bool empty() const { return rows() == 0; }

/**
Expand Down Expand Up @@ -284,6 +317,8 @@ class Block {
// copy a new block by the offset column
Block copy_block(const std::vector<int>& column_offset) const;

// appends selected rows from this block to destination block based on selector
// skips const columns during append operation
Status append_to_block_by_selector(MutableBlock* dst, const IColumn::Selector& selector) const;

// need exception safety
Expand All @@ -295,11 +330,14 @@ class Block {
// need exception safety
static void filter_block_internal(Block* block, const IColumn::Filter& filter);

// Filter block by specified columns using filter column
static Status filter_block(Block* block, const std::vector<uint32_t>& columns_to_filter,
size_t filter_column_id, size_t column_to_keep);

// Filter block using filter column
static Status filter_block(Block* block, size_t filter_column_id, size_t column_to_keep);

// Remove columns after column_to_keep
static void erase_useless_column(Block* block, size_t column_to_keep) {
block->erase_tail(column_to_keep);
}
Expand All @@ -309,8 +347,10 @@ class Block {
size_t* compressed_bytes, segment_v2::CompressionTypePB compression_type,
bool allow_transfer_large_data = false) const;

// deserialize block from PBlock
Status deserialize(const PBlock& pblock);

// Create empty block with same schema
std::unique_ptr<Block> create_same_struct_block(size_t size, bool is_reserve = false) const;

/** Compares (*this) n-th row and rhs m-th row.
Expand All @@ -329,6 +369,7 @@ class Block {
return compare_at(n, m, columns(), rhs, nan_direction_hint);
}

// Compare rows by first num_columns columns in sequential order (from index 0 to num_columns - 1)
int compare_at(size_t n, size_t m, size_t num_columns, const Block& rhs,
int nan_direction_hint) const {
DCHECK_GE(columns(), num_columns);
Expand All @@ -347,6 +388,7 @@ class Block {
return 0;
}

// Compare rows by specified columns in compare_columns
int compare_at(size_t n, size_t m, const std::vector<uint32_t>* compare_columns,
const Block& rhs, int nan_direction_hint) const {
DCHECK_GE(columns(), compare_columns->size());
Expand Down Expand Up @@ -377,24 +419,30 @@ class Block {
// for String type or Array<String> type
void shrink_char_type_column_suffix_zero(const std::vector<size_t>& char_type_idx);

// Get time spent on decompression in nanoseconds
int64_t get_decompress_time() const { return _decompress_time_ns; }
// Get total bytes after decompression
int64_t get_decompressed_bytes() const { return _decompressed_bytes; }
// Get time spent on compression in nanoseconds
int64_t get_compress_time() const { return _compress_time_ns; }

// Set same bit flags for rows in block
void set_same_bit(std::vector<bool>::const_iterator begin,
std::vector<bool>::const_iterator end) {
row_same_bit.insert(row_same_bit.end(), begin, end);

DCHECK_EQ(row_same_bit.size(), rows());
}

// Get same bit flag for specified row position
bool get_same_bit(size_t position) {
if (position >= row_same_bit.size()) {
return false;
}
return row_same_bit[position];
}

// Clear all same bit flags
void clear_same_bit() { row_same_bit.clear(); }

// return string contains use_count() of each columns
Expand All @@ -406,6 +454,7 @@ class Block {
// we built some temporary columns into block
void erase_tmp_columns() noexcept;

// Clear columns not marked for keeping
void clear_column_mem_not_keep(const std::vector<bool>& column_keep_flags,
bool need_keep_first);

Expand Down Expand Up @@ -480,6 +529,7 @@ class MutableBlock {
return _data_types[position];
}

// Compare rows by specified column
int compare_one_column(size_t n, size_t m, size_t column_id, int nan_direction_hint) const {
DCHECK_LE(column_id, columns());
DCHECK_LE(n, rows());
Expand All @@ -488,6 +538,7 @@ class MutableBlock {
return column->compare_at(n, m, *column, nan_direction_hint);
}

// Compare rows by first num_columns columns in sequential order (from index 0 to num_columns - 1)
int compare_at(size_t n, size_t m, size_t num_columns, const MutableBlock& rhs,
int nan_direction_hint) const {
DCHECK_GE(columns(), num_columns);
Expand All @@ -506,6 +557,7 @@ class MutableBlock {
return 0;
}

// Compare rows by specified columns in compare_columns
int compare_at(size_t n, size_t m, const std::vector<uint32_t>* compare_columns,
const MutableBlock& rhs, int nan_direction_hint) const {
DCHECK_GE(columns(), compare_columns->size());
Expand All @@ -524,6 +576,7 @@ class MutableBlock {
return 0;
}

// Get a string representation of the block's data types
std::string dump_types() const {
std::string res;
for (auto type : _data_types) {
Expand Down Expand Up @@ -565,6 +618,7 @@ class MutableBlock {
return Status::OK();
}

// Merge another block into current block with strict type check and overflow handling.
template <typename T>
[[nodiscard]] Status merge_impl(T&& block) {
// merge is not supported in dynamic block
Expand Down Expand Up @@ -613,12 +667,14 @@ class MutableBlock {
return Status::OK();
}

// move to columns' data to a Block. this will invalidate
// Move the data of columns to a block. This will invalidate the MutableBlock.
Block to_block(int start_column = 0);
Block to_block(int start_column, int end_column);

// Swap the contents of two MutableBlocks
void swap(MutableBlock& other) noexcept;

// Move-swap the contents of two MutableBlocks
void swap(MutableBlock&& other) noexcept;

void add_row(const Block* block, int row);
Expand All @@ -628,11 +684,13 @@ class MutableBlock {
Status add_rows(const Block* block, size_t row_begin, size_t length);
Status add_rows(const Block* block, const std::vector<int64_t>& rows);

/// remove the column with the specified name
/// Remove the column with the specified name
void erase(const String& name);

// Get a string representation of the block's data, limited to the specified number of rows
std::string dump_data(size_t row_limit = 100) const;

// Clear the block's data
void clear() {
_columns.clear();
_data_types.clear();
Expand All @@ -644,8 +702,10 @@ class MutableBlock {
// reset columns by types and names.
void reset_column_data() noexcept;

// Returns the total number of bytes allocated by all columns in the block
size_t allocated_bytes() const;

// Returns the approximate number of bytes in memory used by the block
size_t bytes() const {
size_t res = 0;
for (const auto& elem : _columns) {
Expand All @@ -655,16 +715,20 @@ class MutableBlock {
return res;
}

// Get the names of the columns in the block
std::vector<std::string>& get_names() { return _names; }

// Check if the block contains a column with the specified name
bool has(const std::string& name) const;

// Get the position of the column with the specified name
size_t get_position_by_name(const std::string& name) const;

/** Get a list of column names separated by commas. */
std::string dump_names() const;

private:
// Initialize the index by name map
void initialize_index_by_name();
};

Expand All @@ -673,11 +737,13 @@ struct IteratorRowRef {
int row_pos;
bool is_same;

// Compare rows by specified arguments
template <typename T>
int compare(const IteratorRowRef& rhs, const T& compare_arguments) const {
return block->compare_at(row_pos, rhs.row_pos, compare_arguments, *rhs.block, -1);
}

// Reset the IteratorRowRef to default values
void reset() {
block = nullptr;
row_pos = -1;
Expand Down
2 changes: 2 additions & 0 deletions be/test/olap/rowset/beta_rowset_test.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -236,6 +236,8 @@ TEST_F(BetaRowsetTest, ReadTest) {
.region = "region",
.ak = "ak",
.sk = "sk",
.token = "token",
.bucket = "bucket",
}};
std::string resource_id = "10000";
auto res = io::S3FileSystem::create(std::move(s3_conf), io::FileSystem::TMP_FS_ID);
Expand Down
4 changes: 2 additions & 2 deletions be/test/vec/aggregate_functions/agg_linear_histogram_test.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -204,8 +204,8 @@ class AggLinearHistogramTest : public testing::Test {
<< "(" << data_types[0]->get_name() << ")";

AggregateFunctionSimpleFactory factory = AggregateFunctionSimpleFactory::instance();
auto agg_function =
factory.get("linear_histogram", data_types, false, -1, {.enable_decimal256 = true});
auto agg_function = factory.get("linear_histogram", data_types, false, -1,
{.enable_decimal256 = true, .column_infos = {}});
EXPECT_NE(agg_function, nullptr);

std::unique_ptr<char[]> memory(new char[agg_function->size_of_data()]);
Expand Down
Loading
Loading