Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor: Doc and organize fields in struct ExternalSorter #13447

Merged
merged 1 commit into from
Nov 16, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 29 additions & 14 deletions datafusion/physical-plan/src/sorts/sort.rs
Original file line number Diff line number Diff line change
Expand Up @@ -203,39 +203,54 @@ impl ExternalSorterMetrics {
/// in_mem_batches
/// ```
struct ExternalSorter {
/// schema of the output (and the input)
// ========================================================================
// PROPERTIES:
// Fields that define the sorter's configuration and remain constant
// ========================================================================
/// Schema of the output (and the input)
schema: SchemaRef,
/// Sort expressions
expr: Arc<[PhysicalSortExpr]>,
/// If Some, the maximum number of output rows that will be produced
fetch: Option<usize>,
/// The target number of rows for output batches
batch_size: usize,
/// If the in size of buffered memory batches is below this size,
/// the data will be concatenated and sorted in place rather than
/// sort/merged.
sort_in_place_threshold_bytes: usize,

// ========================================================================
// STATE BUFFERS:
// Fields that hold intermediate data during sorting
// ========================================================================
/// Potentially unsorted in memory buffer
in_mem_batches: Vec<RecordBatch>,
/// if `Self::in_mem_batches` are sorted
in_mem_batches_sorted: bool,

/// If data has previously been spilled, the locations of the
/// spill files (in Arrow IPC format)
spills: Vec<RefCountedTempFile>,
/// Sort expressions
expr: Arc<[PhysicalSortExpr]>,

// ========================================================================
// EXECUTION RESOURCES:
// Fields related to managing execution resources and monitoring performance.
// ========================================================================
/// Runtime metrics
metrics: ExternalSorterMetrics,
/// If Some, the maximum number of output rows that will be
/// produced.
fetch: Option<usize>,
/// A handle to the runtime to get spill files
runtime: Arc<RuntimeEnv>,
/// Reservation for in_mem_batches
reservation: MemoryReservation,

/// Reservation for the merging of in-memory batches. If the sort
/// might spill, `sort_spill_reservation_bytes` will be
/// pre-reserved to ensure there is some space for this sort/merge.
merge_reservation: MemoryReservation,
/// A handle to the runtime to get spill files
runtime: Arc<RuntimeEnv>,
/// The target number of rows for output batches
batch_size: usize,
/// How much memory to reserve for performing in-memory sort/merges
/// prior to spilling.
sort_spill_reservation_bytes: usize,
/// If the in size of buffered memory batches is below this size,
/// the data will be concatenated and sorted in place rather than
/// sort/merged.
sort_in_place_threshold_bytes: usize,
}

impl ExternalSorter {
Expand Down