Optimize take/filter/concat from multiple input arrays to a single large output array #6692

alamb · 2024-11-05T19:17:06Z

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Upstream in DataFusion, there is a common common pattern where we have multiple input RecordBatches and want to produce an output RecordBatch with some subset of the rows from the input batches. This happens in

FilterExec --> CoalesceBatchesExec when filtering
RepartitionExec --> CoalesceBatchesExec

The kernels used here are:

FilterExec uses filter, takes a single input Array and produces a single output Array
RepartitionExec uses take, which also takes a single input Array and produces a single output Array``RepartitionExeceach take a single input batch and produce a single output Array
CoalesceBatchesExec calls concat which takes multple Arrays and produces a single Array as output

The use of these kernels and patterns has two downsides:

Performance overhead due to a second copy: Calling filter/take immediately copies the data, which is copied again in CoalesceBatches (see illustration below)
Memory Overhead / Performance Overhead for GarbageCollecting StringView: Buffering up several RecordBatches with StringView may consume significant amounts of memory for mostly filtered rows, which requires us to run gc periodically which actually slows some things down (see Reduce copying in CoalesceBatchesExec for StringViews datafusion#11628)

Here is an ascii art picture (from apache/datafusion#7957) that shows the extra copy in action


┌────────────────────┐        Filter                                                                          
│                    │                    ┌────────────────────┐            Coalesce                          
│                    │    ─ ─ ─ ─ ─ ─ ▶   │    RecordBatch     │             Batches                          
│    RecordBatch     │                    │   num_rows = 234   │─ ─ ─ ─ ─ ┐                                   
│  num_rows = 8000   │                    └────────────────────┘                                              
│                    │                                                    │                                   
│                    │                                                                ┌────────────────────┐  
└────────────────────┘                                                    │           │                    │  
┌────────────────────┐                    ┌────────────────────┐                      │                    │  
│                    │        Filter      │                    │          │           │                    │  
│                    │                    │    RecordBatch     │           ─ ─ ─ ─ ─ ▶│                    │  
│    RecordBatch     │    ─ ─ ─ ─ ─ ─ ▶   │   num_rows = 500   │─ ─ ─ ─ ─ ┐           │                    │  
│  num_rows = 8000   │                    │                    │                      │    RecordBatch     │  
│                    │                    │                    │          └ ─ ─ ─ ─ ─▶│  num_rows = 8000   │  
│                    │                    └────────────────────┘                      │                    │  
└────────────────────┘                                                                │                    │  
                                                    ...                    ─ ─ ─ ─ ─ ▶│                    │  
          ...                   ...                                       │           │                    │  
                                                                                      │                    │  
┌────────────────────┐                                                    │           └────────────────────┘  
│                    │                    ┌────────────────────┐                                              
│                    │       Filter       │                    │          │                                   
│    RecordBatch     │                    │    RecordBatch     │                                              
│  num_rows = 8000   │   ─ ─ ─ ─ ─ ─ ▶    │   num_rows = 333   │─ ─ ─ ─ ─ ┘                                   
│                    │                    │                    │                                              
│                    │                    └────────────────────┘                                              
└────────────────────┘                                                                                        
                                                                                                              
                      FilterExec                                          RepartitonExec copies the data      
                      creates output batches with copies                  *again* to form final large         
                      of  the matching rows (calls take()                 RecordBatches                       
                      to make a copy)

Describe the solution you'd like

I would like to apply filter/take to each incoming RecordBatch as it arrives, copying the data to an in progress output array, in a way that is as fast as the filter and take operations. This would reduce the extra copy that is currently required.

Note this is somewhat like the interleave kernel, except that

We only need the output rows to be in the same order as the input batches (so the second usize batch index is not needed)
We don't want to have to buffer all the input

Describe alternatives you've considered

One thing I have thought about is extending the builders so they can append more than one row at a time. For example:

Builder::append_filtered
Builder::append_take

So for example, to filter a stream of StringViewArrays I might do something like;

let mut builder = StringViewBuilder::new();
while let Some(input) = stream.next() {
  // compute some subset of input rows that make it to the output
  let filter: BooleanArray = compute_filter(&input, ....); 
  // append all rows from input where filter[i] is true
  builder.append_filtered(&input, &filter);
}

And also add an equivalent for append_take

I think if we did this right, it wouldn't be a lot of new code, we could just refactor the existing filter/take implementations. For example, I would expect that the filter kernel would then devolve into something like

fn filter(..) {
  match data_type {
    DataType::Int8 => Int8Builder::with_capacity(...)
      .append_filter(input, filter)
      .build()
...
}

Additional context

The text was updated successfully, but these errors were encountered:

alamb · 2024-11-05T19:36:31Z

FYI

@XiangpengHao who I think I mentioned this to and
@tustvold in case you have some thoughts about why this is crazy
@Rachelint / @jayzhan211 / @Dandandan because you guys were talking about this on Support vectorized append and compare for multi group by datafusion#12996

devanbenz · 2024-11-07T01:44:40Z

@alamb just to clarify your idea is to modify the existing take & filter kernels. Not create a new one right?

jayzhan211 · 2024-11-07T01:50:20Z

@alamb just to clarify your idea is to modify the existing take & filter kernels. Not create a new one right?

If creating a new one helps, no reason not to do it.

tustvold · 2024-11-07T04:39:40Z

I think the idea is sound in principle, but needs a concrete API proposal.

I'm not sure the proposed builder API makes sense, as the typing for nested types like ListBuilder and DictionaryBuilder is not what we want here, and they can't easily be type erased. We also ideally want to avoid overly bloating the arrow-array crate with kernel logic. This isn't even touching on the fact these kernels don't use the builders for performance reasons.

I think we'd need to introduce a new type-erased MutableArray abstraction or something, potentially replacing the rather problematic MutableArrayData.

The only remaining challenge concerns dictionaries, as the output dictionary needs to be computed up front. Simply not supporting dictonaries could potentially be a valid workaround though.

jayzhan211 · 2024-11-07T06:46:05Z

Not sure about other type but for StringView, I can only think of iterating all the filtered row and append_value one by one. If there is no further optimization we can do, I think we can implement the append logic in datafusion

devanbenz · 2024-11-07T12:38:15Z

Not sure about other type but for StringView, I can only think of iterating all the filtered row and append_value one by one. If there is no further optimization we can do, I think we can implement the append logic in datafusion

'append_value' does a copy though. Wouldn't that effectively still be a large amount of copies?

jayzhan211 · 2024-11-07T12:49:54Z

Not sure about other type but for StringView, I can only think of iterating all the filtered row and append_value one by one. If there is no further optimization we can do, I think we can implement the append logic in datafusion

'append_value' does a copy though. Wouldn't that effectively still be a large amount of copies?

With this approach, we reduce the copying to a single step. Compared to the current approach, where copying happens in multiple stages (filtering, garbage collection, and coalescing), my proposal combines these steps into one. While benchmarks are needed to confirm any performance gains, this method should, at the very least, not perform worse than the existing one

devanbenz · 2024-11-07T13:07:01Z

Not sure about other type but for StringView, I can only think of iterating all the filtered row and append_value one by one. If there is no further optimization we can do, I think we can implement the append logic in datafusion

'append_value' does a copy though. Wouldn't that effectively still be a large amount of copies?

With this approach, we reduce the copying to a single step. Compared to the current approach, where copying happens in multiple stages (filtering, garbage collection, and coalescing), my proposal combines these steps into one. While benchmarks are needed to confirm any performance gains, this method should, at the very least, not perform worse than the existing one

That makes sense. This would be a change downstream within Datafusion then correct?

jayzhan211 · 2024-11-07T13:25:16Z

Not sure about other type but for StringView, I can only think of iterating all the filtered row and append_value one by one. If there is no further optimization we can do, I think we can implement the append logic in datafusion

'append_value' does a copy though. Wouldn't that effectively still be a large amount of copies?

With this approach, we reduce the copying to a single step. Compared to the current approach, where copying happens in multiple stages (filtering, garbage collection, and coalescing), my proposal combines these steps into one. While benchmarks are needed to confirm any performance gains, this method should, at the very least, not perform worse than the existing one

That makes sense. This would be a change downstream within Datafusion then correct?

yes, you can work on it if you want to

alamb · 2024-11-07T15:31:13Z

So it sounds like the consensus is to work out how this might look downstream in DataFusion (maybe starting with StringView as that is what is giving us the most trouble now) and then use some of that knowledge to propose something upstream in Arrow -- sounds like a good idea to me

alamb · 2024-11-07T15:33:09Z

Not sure about other type but for StringView, I can only think of iterating all the filtered row and append_value one by one. If there is no further optimization we can do, I think we can implement the append logic in datafusion

@jayzhan211 yes I think this is effectively what would happen -- however the actual iteration over filtered values is quite optimized in the filter kernel (checkout what the FilterBuilder does) based on how many values are filtered and other aspect

The fact that filter is so fast in arrow means it is quite hard to get as good / faster :)

alamb · 2024-11-07T15:36:34Z

@tustvold

I'm not sure the proposed builder API makes sense, as the typing for nested types like ListBuilder and DictionaryBuilder is not what we want here, and they can't easily be type erased. We also ideally want to avoid overly bloating the arrow-array crate with kernel logic.

This is reasonable -- though I could imagine adding type erased builders like DynListBuilder for this usecase

This isn't even touching on the fact these kernels don't use the builders for performance reasons.

Is there some fundamental reason the builders can't made faster? If we could make the builders fast enough to use for filter that would seem to be valuable in its own right. But I am likely just dreaming here

The only remaining challenge concerns dictionaries, as the output dictionary needs to be computed up front. Simply not supporting dictonaries could potentially be a valid workaround though.

A builder based approach could help (e.g. optimize for the case where the input batches had the same dictionary and handle the case where they didn't -- either via deferred computation or on the fly or something else)

Rachelint · 2024-11-07T15:58:49Z

Not sure about other type but for StringView, I can only think of iterating all the filtered row and append_value one by one. If there is no further optimization we can do, I think we can implement the append logic in datafusion

Does approach seems like that for filter?

loop the input array and input predicate
get the value of needed row(true in input predicate), and append_value to the builder

And at least, we can avoid generate multiple small batches, and concat them(a ton of copies) when big enough.

tustvold · 2024-11-07T16:07:09Z

This is reasonable -- though I could imagine adding type erased builders like DynListBuilder for this usecase

This sort of partially type-erased API seems like the worst of both worlds, you either want something that is completely type-erased (e.g. MutableArrayData), or fully typed (e.g. ListBuilder).

I could see us adding some sort of MutableArray abstraction to arrow-select that allows appending values from arrays based on a mask and/or selection. This would be useful not just for this use-case, but potentially as a MutableBuffer abstraction for databases, etc... However, it would be very complex to implement, especially for dictionaries.

Is there some fundamental reason the builders can't made faster

Not without changing their APIs 😅. For the primitive builders one could simply move the current kernel implementations into the builders, but this doesn't really achieve much IMO.

A builder based approach could help (e.g. optimize for the case where the input batches had the same dictionary and handle the case where they didn't -- either via deferred computation or on the fly or something else)

Yeah, it gets very complicated and fiddly. A similar challenge likely exists for StringView, although I'm not sure what level of sophistication we've reached w.r.t automatic GC.

Does approach seems like that for filter?

That would be a very naive way to implement the filter kernel, I would encourage looking at what the selection kernels actually do.

Rachelint · 2024-11-07T16:44:28Z

That would be a very naive way to implement the filter kernel, I would encourage looking at what the selection kernels actually do.

I agree with it seems a naive version for filter.

Is it possible to public something like filter_native but return a iterator, then we can reduce some copied in downstream and reuse the well optimized filter in arrow:

// current
filter --> intemediate buffer in array --> final buffer

// optimized
filter --> final buffer

jayzhan211 · 2024-11-08T06:10:17Z

arrow-rs/arrow-select/src/filter.rs

Line 740 in e9bf8aa

.add_buffers(array.data_buffers().to_vec());

This line of code extends buffer (byte copied) regardless of the filtered result, I think it is the reason why we need gc. If we do append_value here, we have additional hash lookup and insert, but less buffer copied especially in low selectivity and no gc required later on.

alamb · 2024-11-08T12:30:11Z

arrow-rs/arrow-select/src/filter.rs

Line 740 in e9bf8aa

.add_buffers(array.data_buffers().to_vec());

This line of code extends buffer (byte copied) regardless of the filtered result, I think it is the reason why we need gc. If we do append_value here, we have additional hash lookup and insert, but less buffer copied especially in low selectivity and no gc required later on.

To be clear -- I think the copy is of a Vec<Buffer> (which is a Vec of pointers to the data)

jayzhan211 · 2024-11-13T05:17:05Z

This line of code extends buffer (byte copied) regardless of the filtered result, I think it is the reason why we need gc. If we do append_value here, we have additional hash lookup and insert, but less buffer copied especially in low selectivity and no gc required later on.

I would try to implement a builder like in datafusion

jayzhan211 · 2024-11-16T14:05:03Z

This line of code extends buffer (byte copied) regardless of the filtered result, I think it is the reason why we need gc. If we do append_value here, we have additional hash lookup and insert, but less buffer copied especially in low selectivity and no gc required later on.

Didn't see improvement on this approach apache/datafusion#13450

alamb · 2024-11-18T15:57:24Z

This line of code extends buffer (byte copied) regardless of the filtered result, I think it is the reason why we need gc. If we do append_value here, we have additional hash lookup and insert, but less buffer copied especially in low selectivity and no gc required later on.

Didn't see improvement on this approach apache/datafusion#13450

I didn't have a chance to full look at apache/datafusion#13450 -- can you summarize what approach it implemented?

jayzhan211 · 2024-11-19T00:26:42Z

The idea is to append the string view array as early as possible in the optimal memory usage to eliminate the need of garbage collection and aggregate those filtered rows in a single large batches instead of many small batches.

Current implementation

We first do Filter then Coalesce in two different operator. The coalescer push_batch try to do gc for string view type. And then concat those small batches again.

This PR

I try to combine Filter and Coalesce in Filter operator. And append string view type to the coalescer until the batch size (8192) or no more incoming batch. Then sent it to Coalesce which ideally there should do nothing and pass to the next operator.

This approach has additional cost of computing each view and lookup view's hash again, but eliminate the need of gc.

The result meet my assumption that has no much difference, but currently no further improvement in my mind too.

ctsk · 2025-03-18T22:17:04Z

I think a good solution to this is worthwhile to implement -- from what I can see, it could eliminate many uses of CoalesceExec in datafusion.

For the primitive builders one could simply move the current kernel implementations into the builders, but this doesn't really achieve much IMO.

Been there, done that 🙈 - for Primitive/Bytes/ByteView Arrays. It's not nice. Since I already did the menial work, I could benchmark the impact it has (when combined with some repartitioning changes in datafusion that take advantage of this). It does achieve avoiding the coalesce step / concatenating the short arrays after repartitioning. The caller also has to adjust (e.g. take care not to exceed the capacity of the builder to avoid resizing).

alamb · 2025-03-20T16:10:46Z

I think a good solution to this is worthwhile to implement -- from what I can see, it could eliminate many uses of CoalesceExec in datafusion.

I agree

Been there, done that 🙈 - for Primitive/Bytes/ByteView Arrays. It's not nice.

Does that mean you have an implementation that you could potentially share / contribute?

ctsk · 2025-03-24T13:28:52Z

Does that mean you have an implementation that you could potentially share / contribute?

I've opened draft PRs for the changes in arrow (#7325) and in datafusion (apache/datafusion#15392).

In arrow, I added a "take_in" kernel, that takes an array, indices and a builder, then appends the elements of array at indices indices to the given builder. I also added a RecordBatchBuilder that holds a collection of ArrayBuilders for convenience (Similar to how a RecordBatch holds a collection of Arrays).

In datafusion, I tried modifying the RepartitionExec to use this API. This meant

Move the coalesce step closer to the take step: Currently, datafusion only coalesces after distributing the batch to the destination partition, to use this kind of API, we need to coalesce in each producing thread before distributing
Replace the take+coalesce combo with a RecordBatchBuilder

I suspect this currently fails to build due to the chrono issue...

alamb · 2025-03-24T21:13:10Z

Thanks @ctsk -- this looks pretty sweet -- I will try and give it a careful look tomorrow

ctsk · 2025-04-08T10:24:20Z

I've been contemplating how to best add an appending take kernel and these my current thoughts:

Coming from DataFusion...

Based on the experimental implementation of take_in, I believe the interface needed for DataFusion's use case looks like this:

trait ColumnSink {
    fn take(&mut self, source: &dyn Array, indices: &dyn Array);
    fn emit_finished(&mut self) -> Option<Arc<dyn Array>>;
    fn flush(&mut self);
}

ColumnSink maintains internal state, with ColumnSink::take logically appending the result of the take operation to its internal state. A key property is that ColumnSink can output its state incrementally rather than requiring a single vector output. This makes ColumnSink at least as capable as DataFusion's current approach, as it can fall back to using the current take kernel and leave concatenation to a CoalesceBatchesExec.

I believe this abstraction belongs to DataFusion rather than arrow-rs.

Allocation-minimal kernels in arrow-rs

Ideally, ColumnSink would produce perfectly-sized batches of a given size, requiring it to control the destination area's sizing. This calls for a version of take that writes into a provided buffer. I propose:

fn take_and_append(
   source: &dyn Array,
   indices: &dyn Array,
   destination: ???
) -> Result<(), ArrowError>;

The question is what type should replace ???. I've considered these options:

Option 1: `Box<dyn ArrayBuilder>` | `&mut dyn ArrayBuilder`

This was the starting point for this issue. ArrayBuilder is convenient because it already contains the appropriate data structures. While PrimitiveArrays hold all state in the result array, for DictionaryArrays, we'd want the destination to hold a hash table that won't be in the finished array.

The challenge with using ArrayBuilder is that builders don't sufficiently expose their internals for kernel implementations.

Potential approaches:

Expand the builder interface: Add methods giving callers more control over builder mutation. The challenge is finding reusable abstractions between builders, as well as balancing between unsafe methods that might leave the builder in an inconsistent state and enforcing consistency after each manipulation.
Deconstruction approach: Each builder would implement into_parts and new/new_unchecked, similar to what exists for Arrays. Kernels could arbitrarily mutate these buffers and then reconstruct the builder. This seems cleaner than expanding the builder interface further.

The deconstruction approach has the drawback of requiring ownership of the builder, whereas a mutable reference would make more sense semantically.

Option 2: `&mut MutableArray`

Taking the deconstruction approach further, we could introduce mutable counterparts (MutablePrimitiveArray<T>, etc.) for each array type as the output of into_parts.

We would add a type-erased MutableArray abstraction on top. This type would essentially hold the same state as a builder but could be more freely mutated without many consistency guarantees. Consistency would be checked when converting from MutableArray to Array.

Each concrete mutable array would expose methods to manipulate each of its fields, making its internal data and workings transparent.

The trait could be:

trait MutableArray {
   fn data_type(&self) -> DataType;
   fn len(&self) -> usize;
   fn capacity(&self) -> usize;
   fn into_array(self) -> Arc<dyn Array>;
   fn into_builder(self) -> Arc<dyn ArrayBuilder>;
}

It wouldn't need common methods to manipulate the MutableArray since kernels would downcast it before manipulation. It could feature a method to modify the optional null buffer.

All names are subject to change

Thoughts?

tustvold · 2025-04-08T11:24:53Z

Perhaps it might be worth thinking about what use-cases we're trying to improve with this effort, this will ensure we design something that adequately addresses that use-case?

If we're just talking about PrimitiveArray and StringViewArray types, then I suspect any performance delta is likely to be relatively minor as concatenating such arrays is already extremely cheap.

If, however, we're looking to improve the performance of DictionaryArray, this becomes a whole different can of worms as any append-based interface is likely to struggle to efficiently handle arrays with heterogeneous dictionary values. I'm not sure if there is a good solution here tbh.

The only array types where I could see such an append interface potentially having compelling performance benefits are (Large)StringArray, as it would allow eliding potentially large string copies. That being said this would be reliant on knowing the expected amount of string data up-front, which an append interface won't necessarily know, and use-cases should probably just use StringViewArray...

The initial issue also stated

Memory Overhead / Performance Overhead for GarbageCollecting StringView: Buffering up several RecordBatches with StringView may consume significant amounts of memory for mostly filtered rows, which requires us to run gc periodically which actually slows some things down (see apache/datafusion#11628)

But I am honestly not entirely sure how an append interface really changes this, you need to perform some sort of GC at some point, it is unclear to me why doing it as part of a Coalesce operation or as part of the filter itself would behave materially differently...

Edit: In fact it looks like @jayzhan211 tried this and confirmed it didn't make much difference - #6692 (comment)

alamb · 2025-04-08T13:35:57Z

Perhaps it might be worth thinking about what use-cases we're trying to improve with this effort, this will ensure we design something that adequately addresses that use-case?

The core usecase is my mind is to save a copy (and associated allocation overheads, etc) when building up an output array from subsets of multiple input arrays.

Today this operation requires using a two step process with two kernels:

filter or take --> intermediate and then concat to form the output

In certain queries in DataFusion this copying show up in profiles. For example, it appears in queries with relatively unselective filters that involve Strings, such as these clickbench queries where the predicate SearchPhrase <> '' passes the long strings through.

Example

SELECT "SearchPhrase" FROM hits WHERE "SearchPhrase" <> '' ORDER BY "EventTime" LIMIT 10;

The theory is that by eliminating the intermediate copy and build the desired output array directly we will improve performance

The same pattern also happens when repartitioning high cardinality aggregates (again on Strings)

alamb · 2025-04-08T13:44:04Z

Thoughts?

I still don't understand why extending the ArrayBuilders with "native" support for appending filtered/take output isn't a viable solution, such as:

let mut builder = StringViewBuilder::new();
while let Some(input) = stream.next() {
  // compute some subset of input rows that make it to the output
  let filter: BooleanArray = compute_filter(&input, ....); 
  // append all rows from input where filter[i] is true
  builder.append_filtered(&input, &filter);
}

The challenge with using ArrayBuilder is that builders don't sufficiently expose their internals for kernel implementations.

This is exactly why I am suggesting the filter/take thing in the ArrayBuilder itself -- where the low level details of the source/target arrays can be used.

If the concern is that adding filter/take code in the arrow-array module woud increase code bloat too much I am sure we could sort that out somehow.

alamb · 2025-04-08T13:46:15Z

Maybe to @tustvold 's point, someone can start the project by implementing some benchmarks illustrating what we are trying to do:

Create a bunch of input arrays
apply filter (and take) and call concat to create an output array

Then we can properly consider how to make new APIs faster

tustvold · 2025-04-08T14:08:15Z

The theory is that by eliminating the intermediate copy and build the desired output array directly we will improve performance

Is this using StringViewArray or StringArray? I can see there being potential benefits for the latter, but then I wonder if it is worthwhile expending effort on an array type we are trying to move away from...

I still don't understand why extending the ArrayBuilders with "native" support for appending filtered/take output isn't a viable solution

It is viable, but isn't trivial (especially for dictionaries or nested arrays) and it isn't clear at least to me why it would make all that much of a performance difference for anything other than StringArray

alamb · 2025-04-08T14:23:50Z

Is this using StringViewArray or StringArray? I can see there being potential benefits for the latter, but then I wonder if it is worthwhile expending effort on an array type we are trying to move away from...

It was both as I recall, but it probably bears double checking.

alamb added the enhancement Any new improvement worthy of a entry in the changelog label Nov 5, 2024

alamb changed the title ~~Optimize take/filter from multiple input arrays to a single output array~~ Optimize take/filter from multiple input arrays to a single large output array Nov 5, 2024

This was referenced Nov 5, 2024

Potential performance regression for TPCH q18 apache/datafusion#13188

Open

Support vectorized append and compare for multi group by apache/datafusion#12996

Merged

alamb mentioned this issue Nov 19, 2024

[not improved] Filter coalesce apache/datafusion#13450

Closed

This was referenced Nov 25, 2024

Add shrink_to_fit to Array #6360

Closed

Add support StringView / BinaryView in interleave kernel #6779

Merged

Improve performance by specializing mutable array data for each data type #6815

Closed

andygrove mentioned this issue Jan 8, 2025

Optimize repartitioning logic in ShuffleWriterExec using interleave_record_batch apache/datafusion-comet#1235

Closed

alamb mentioned this issue Mar 7, 2025

[EPIC] (Even More) Grouping / Group By / Aggregation Performance apache/datafusion#7000

Open

17 tasks

This was referenced Mar 25, 2025

Add take_in kernel #7325

Draft

Improve concat performance, and add append_array for some array builder implementations #7309

Merged

alamb mentioned this issue May 12, 2025

Poc for adaptive parquet predicate pushdown(bitmap/range) with page cache(3 data pages) #7454

Open

alamb changed the title ~~Optimize take/filter from multiple input arrays to a single large output array~~ Optimize take/filter/concat from multiple input arrays to a single large output array May 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize take/filter/concat from multiple input arrays to a single large output array #6692

Optimize take/filter/concat from multiple input arrays to a single large output array #6692

alamb commented Nov 5, 2024 •

edited

Loading

alamb commented Nov 5, 2024

devanbenz commented Nov 7, 2024

jayzhan211 commented Nov 7, 2024

tustvold commented Nov 7, 2024

jayzhan211 commented Nov 7, 2024 •

edited

Loading

devanbenz commented Nov 7, 2024

jayzhan211 commented Nov 7, 2024

devanbenz commented Nov 7, 2024

jayzhan211 commented Nov 7, 2024 •

edited

Loading

alamb commented Nov 7, 2024

alamb commented Nov 7, 2024

alamb commented Nov 7, 2024

Rachelint commented Nov 7, 2024 •

edited

Loading

tustvold commented Nov 7, 2024 •

edited

Loading

Rachelint commented Nov 7, 2024

jayzhan211 commented Nov 8, 2024 •

edited

Loading

alamb commented Nov 8, 2024

jayzhan211 commented Nov 13, 2024

jayzhan211 commented Nov 16, 2024

alamb commented Nov 18, 2024

jayzhan211 commented Nov 19, 2024 •

edited

Loading

ctsk commented Mar 18, 2025

alamb commented Mar 20, 2025

ctsk commented Mar 24, 2025

alamb commented Mar 24, 2025

ctsk commented Apr 8, 2025 •

edited

Loading

tustvold commented Apr 8, 2025 •

edited

Loading

alamb commented Apr 8, 2025 •

edited

Loading

alamb commented Apr 8, 2025

alamb commented Apr 8, 2025 •

edited

Loading

tustvold commented Apr 8, 2025 •

edited

Loading

alamb commented Apr 8, 2025

Optimize take/filter/concat from multiple input arrays to a single large output array #6692

Optimize take/filter/concat from multiple input arrays to a single large output array #6692

Comments

alamb commented Nov 5, 2024 • edited Loading

alamb commented Nov 5, 2024

devanbenz commented Nov 7, 2024

jayzhan211 commented Nov 7, 2024

tustvold commented Nov 7, 2024

jayzhan211 commented Nov 7, 2024 • edited Loading

devanbenz commented Nov 7, 2024

jayzhan211 commented Nov 7, 2024

devanbenz commented Nov 7, 2024

jayzhan211 commented Nov 7, 2024 • edited Loading

alamb commented Nov 7, 2024

alamb commented Nov 7, 2024

alamb commented Nov 7, 2024

Rachelint commented Nov 7, 2024 • edited Loading

tustvold commented Nov 7, 2024 • edited Loading

Rachelint commented Nov 7, 2024

jayzhan211 commented Nov 8, 2024 • edited Loading

alamb commented Nov 8, 2024

jayzhan211 commented Nov 13, 2024

jayzhan211 commented Nov 16, 2024

alamb commented Nov 18, 2024

jayzhan211 commented Nov 19, 2024 • edited Loading

Current implementation

This PR

ctsk commented Mar 18, 2025

alamb commented Mar 20, 2025

ctsk commented Mar 24, 2025

alamb commented Mar 24, 2025

ctsk commented Apr 8, 2025 • edited Loading

Coming from DataFusion...

Allocation-minimal kernels in arrow-rs

Option 1: Box<dyn ArrayBuilder> | &mut dyn ArrayBuilder

Option 2: &mut MutableArray

tustvold commented Apr 8, 2025 • edited Loading

alamb commented Apr 8, 2025 • edited Loading

alamb commented Apr 8, 2025

alamb commented Apr 8, 2025 • edited Loading

tustvold commented Apr 8, 2025 • edited Loading

alamb commented Apr 8, 2025

alamb commented Nov 5, 2024 •

edited

Loading

jayzhan211 commented Nov 7, 2024 •

edited

Loading

jayzhan211 commented Nov 7, 2024 •

edited

Loading

Rachelint commented Nov 7, 2024 •

edited

Loading

tustvold commented Nov 7, 2024 •

edited

Loading

jayzhan211 commented Nov 8, 2024 •

edited

Loading

jayzhan211 commented Nov 19, 2024 •

edited

Loading

ctsk commented Apr 8, 2025 •

edited

Loading

Option 1: `Box<dyn ArrayBuilder>` | `&mut dyn ArrayBuilder`

Option 2: `&mut MutableArray`

tustvold commented Apr 8, 2025 •

edited

Loading

alamb commented Apr 8, 2025 •

edited

Loading

alamb commented Apr 8, 2025 •

edited

Loading

tustvold commented Apr 8, 2025 •

edited

Loading