Skip to content

Commit 087f34b

Browse files
authored
Refine documentation for unary_mut and binary_mut (#5798)
* Refine documentation for unary_mut and binary_mut, * Update arrow-array/src/array/primitive_array.rs * Update binary_mut example to show different array types
1 parent fa8d350 commit 087f34b

File tree

4 files changed

+194
-67
lines changed

4 files changed

+194
-67
lines changed

arrow-arith/src/arity.rs

+85-26
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
// specific language governing permissions and limitations
1616
// under the License.
1717

18-
//! Defines kernels suitable to perform operations to primitive arrays.
18+
//! Kernels for operating on [`PrimitiveArray`]s
1919
2020
use arrow_array::builder::BufferBuilder;
2121
use arrow_array::types::ArrowDictionaryKeyType;
@@ -162,18 +162,38 @@ where
162162
}
163163
}
164164

165+
/// Allies a binary infallable function to two [`PrimitiveArray`]s,
166+
/// producing a new [`PrimitiveArray`]
167+
///
168+
/// # Details
169+
///
165170
/// Given two arrays of length `len`, calls `op(a[i], b[i])` for `i` in `0..len`, collecting
166-
/// the results in a [`PrimitiveArray`]. If any index is null in either `a` or `b`, the
171+
/// the results in a [`PrimitiveArray`].
172+
///
173+
/// If any index is null in either `a` or `b`, the
167174
/// corresponding index in the result will also be null
168175
///
169-
/// Like [`unary`] the provided function is evaluated for every index, ignoring validity. This
170-
/// is beneficial when the cost of the operation is low compared to the cost of branching, and
171-
/// especially when the operation can be vectorised, however, requires `op` to be infallible
172-
/// for all possible values of its inputs
176+
/// Like [`unary`], the `op` is evaluated for every element in the two arrays,
177+
/// including those elements which are NULL. This is beneficial as the cost of
178+
/// the operation is low compared to the cost of branching, and especially when
179+
/// the operation can be vectorised, however, requires `op` to be infallible for
180+
/// all possible values of its inputs
173181
///
174-
/// # Error
182+
/// # Errors
183+
///
184+
/// * if the arrays have different lengths.
175185
///
176-
/// This function gives error if the arrays have different lengths
186+
/// # Example
187+
/// ```
188+
/// # use arrow_arith::arity::binary;
189+
/// # use arrow_array::{Float32Array, Int32Array};
190+
/// # use arrow_array::types::Int32Type;
191+
/// let a = Float32Array::from(vec![Some(5.1f32), None, Some(6.8), Some(7.2)]);
192+
/// let b = Int32Array::from(vec![1, 2, 4, 9]);
193+
/// // compute int(a) + b for each element
194+
/// let c = binary(&a, &b, |a, b| a as i32 + b).unwrap();
195+
/// assert_eq!(c, Int32Array::from(vec![Some(6), None, Some(10), Some(16)]));
196+
/// ```
177197
pub fn binary<A, B, F, O>(
178198
a: &PrimitiveArray<A>,
179199
b: &PrimitiveArray<B>,
@@ -207,23 +227,70 @@ where
207227
Ok(PrimitiveArray::new(buffer.into(), nulls))
208228
}
209229

210-
/// Given two arrays of length `len`, calls `op(a[i], b[i])` for `i` in `0..len`, mutating
211-
/// the mutable [`PrimitiveArray`] `a`. If any index is null in either `a` or `b`, the
212-
/// corresponding index in the result will also be null.
230+
/// Applies a binary and infallible function to values in two arrays, replacing
231+
/// the values in the first array in place.
232+
///
233+
/// # Details
234+
///
235+
/// Given two arrays of length `len`, calls `op(a[i], b[i])` for `i` in
236+
/// `0..len`, modifying the [`PrimitiveArray`] `a` in place, if possible.
237+
///
238+
/// If any index is null in either `a` or `b`, the corresponding index in the
239+
/// result will also be null.
213240
///
214-
/// Mutable primitive array means that the buffer is not shared with other arrays.
215-
/// As a result, this mutates the buffer directly without allocating new buffer.
241+
/// # Buffer Reuse
242+
///
243+
/// If the underlying buffers in `a` are not shared with other arrays, mutates
244+
/// the underlying buffer in place, without allocating.
245+
///
246+
/// If the underlying buffer in `a` are shared, returns Err(self)
216247
///
217248
/// Like [`unary`] the provided function is evaluated for every index, ignoring validity. This
218249
/// is beneficial when the cost of the operation is low compared to the cost of branching, and
219250
/// especially when the operation can be vectorised, however, requires `op` to be infallible
220251
/// for all possible values of its inputs
221252
///
222-
/// # Error
253+
/// # Errors
254+
///
255+
/// * If the arrays have different lengths
256+
/// * If the array is not mutable (see "Buffer Reuse")
257+
///
258+
/// # See Also
259+
///
260+
/// * Documentation on [`PrimitiveArray::unary_mut`] for operating on [`ArrayRef`].
223261
///
224-
/// This function gives error if the arrays have different lengths.
225-
/// This function gives error of original [`PrimitiveArray`] `a` if it is not a mutable
226-
/// primitive array.
262+
/// # Example
263+
/// ```
264+
/// # use arrow_arith::arity::binary_mut;
265+
/// # use arrow_array::{Float32Array, Int32Array};
266+
/// # use arrow_array::types::Int32Type;
267+
/// // compute a + b for each element
268+
/// let a = Float32Array::from(vec![Some(5.1f32), None, Some(6.8)]);
269+
/// let b = Int32Array::from(vec![Some(1), None, Some(2)]);
270+
/// // compute a + b, updating the value in a in place if possible
271+
/// let a = binary_mut(a, &b, |a, b| a + b as f32).unwrap().unwrap();
272+
/// // a is updated in place
273+
/// assert_eq!(a, Float32Array::from(vec![Some(6.1), None, Some(8.8)]));
274+
/// ```
275+
///
276+
/// # Example with shared buffers
277+
/// ```
278+
/// # use arrow_arith::arity::binary_mut;
279+
/// # use arrow_array::Float32Array;
280+
/// # use arrow_array::types::Int32Type;
281+
/// let a = Float32Array::from(vec![Some(5.1f32), None, Some(6.8)]);
282+
/// let b = Float32Array::from(vec![Some(1.0f32), None, Some(2.0)]);
283+
/// // a_clone shares the buffer with a
284+
/// let a_cloned = a.clone();
285+
/// // try to update a in place, but it is shared. Returns Err(a)
286+
/// let a = binary_mut(a, &b, |a, b| a + b).unwrap_err();
287+
/// assert_eq!(a_cloned, a);
288+
/// // drop shared reference
289+
/// drop(a_cloned);
290+
/// // now a is not shared, so we can update it in place
291+
/// let a = binary_mut(a, &b, |a, b| a + b).unwrap().unwrap();
292+
/// assert_eq!(a, Float32Array::from(vec![Some(6.1), None, Some(8.8)]));
293+
/// ```
227294
pub fn binary_mut<T, U, F>(
228295
a: PrimitiveArray<T>,
229296
b: &PrimitiveArray<U>,
@@ -319,15 +386,7 @@ where
319386
///
320387
/// Like [`try_unary`] the function is only evaluated for non-null indices
321388
///
322-
/// Mutable primitive array means that the buffer is not shared with other arrays.
323-
/// As a result, this mutates the buffer directly without allocating new buffer.
324-
///
325-
/// # Error
326-
///
327-
/// Return an error if the arrays have different lengths or
328-
/// the operation is under erroneous.
329-
/// This function gives error of original [`PrimitiveArray`] `a` if it is not a mutable
330-
/// primitive array.
389+
/// See [`binary_mut`] for errors and buffer reuse information
331390
pub fn try_binary_mut<T, F>(
332391
a: PrimitiveArray<T>,
333392
b: &PrimitiveArray<T>,

arrow-array/src/array/primitive_array.rs

+98-35
Original file line numberDiff line numberDiff line change
@@ -419,7 +419,7 @@ pub type Decimal256Array = PrimitiveArray<Decimal256Type>;
419419

420420
pub use crate::types::ArrowPrimitiveType;
421421

422-
/// An array of [primitive values](https://arrow.apache.org/docs/format/Columnar.html#fixed-size-primitive-layout)
422+
/// An array of primitive values, of type [`ArrowPrimitiveType`]
423423
///
424424
/// # Example: From a Vec
425425
///
@@ -480,6 +480,19 @@ pub use crate::types::ArrowPrimitiveType;
480480
/// assert_eq!(array.values(), &[1, 0, 2]);
481481
/// assert!(array.is_null(1));
482482
/// ```
483+
///
484+
/// # Example: Get a `PrimitiveArray` from an [`ArrayRef`]
485+
/// ```
486+
/// # use std::sync::Arc;
487+
/// # use arrow_array::{Array, cast::AsArray, ArrayRef, Float32Array, PrimitiveArray};
488+
/// # use arrow_array::types::{Float32Type};
489+
/// # use arrow_schema::DataType;
490+
/// # let array: ArrayRef = Arc::new(Float32Array::from(vec![1.2, 2.3]));
491+
/// // will panic if the array is not a Float32Array
492+
/// assert_eq!(&DataType::Float32, array.data_type());
493+
/// let f32_array: Float32Array = array.as_primitive().clone();
494+
/// assert_eq!(f32_array, Float32Array::from(vec![1.2, 2.3]));
495+
/// ```
483496
pub struct PrimitiveArray<T: ArrowPrimitiveType> {
484497
data_type: DataType,
485498
/// Values data
@@ -732,22 +745,34 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
732745
PrimitiveArray::from(unsafe { d.build_unchecked() })
733746
}
734747

735-
/// Applies an unary and infallible function to a primitive array.
736-
/// This is the fastest way to perform an operation on a primitive array when
737-
/// the benefits of a vectorized operation outweigh the cost of branching nulls and non-nulls.
748+
/// Applies a unary infallible function to a primitive array, producing a
749+
/// new array of potentially different type.
750+
///
751+
/// This is the fastest way to perform an operation on a primitive array
752+
/// when the benefits of a vectorized operation outweigh the cost of
753+
/// branching nulls and non-nulls.
738754
///
739-
/// # Implementation
755+
/// See also
756+
/// * [`Self::unary_mut`] for in place modification.
757+
/// * [`Self::try_unary`] for fallible operations.
758+
/// * [`arrow::compute::binary`] for binary operations
759+
///
760+
/// [`arrow::compute::binary`]: https://docs.rs/arrow/latest/arrow/compute/fn.binary.html
761+
/// # Null Handling
762+
///
763+
/// Applies the function for all values, including those on null slots. This
764+
/// will often allow the compiler to generate faster vectorized code, but
765+
/// requires that the operation must be infallible (not error/panic) for any
766+
/// value of the corresponding type or this function may panic.
740767
///
741-
/// This will apply the function for all values, including those on null slots.
742-
/// This implies that the operation must be infallible for any value of the corresponding type
743-
/// or this function may panic.
744768
/// # Example
745769
/// ```rust
746-
/// # use arrow_array::{Int32Array, types::Int32Type};
770+
/// # use arrow_array::{Int32Array, Float32Array, types::Int32Type};
747771
/// # fn main() {
748772
/// let array = Int32Array::from(vec![Some(5), Some(7), None]);
749-
/// let c = array.unary(|x| x * 2 + 1);
750-
/// assert_eq!(c, Int32Array::from(vec![Some(11), Some(15), None]));
773+
/// // Create a new array with the value of applying sqrt
774+
/// let c = array.unary(|x| f32::sqrt(x as f32));
775+
/// assert_eq!(c, Float32Array::from(vec![Some(2.236068), Some(2.6457512), None]));
751776
/// # }
752777
/// ```
753778
pub fn unary<F, O>(&self, op: F) -> PrimitiveArray<O>
@@ -766,24 +791,50 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
766791
PrimitiveArray::new(buffer.into(), nulls)
767792
}
768793

769-
/// Applies an unary and infallible function to a mutable primitive array.
770-
/// Mutable primitive array means that the buffer is not shared with other arrays.
771-
/// As a result, this mutates the buffer directly without allocating new buffer.
794+
/// Applies a unary and infallible function to the array in place if possible.
795+
///
796+
/// # Buffer Reuse
797+
///
798+
/// If the underlying buffers are not shared with other arrays, mutates the
799+
/// underlying buffer in place, without allocating.
800+
///
801+
/// If the underlying buffer is shared, returns Err(self)
772802
///
773-
/// # Implementation
803+
/// # Null Handling
804+
///
805+
/// See [`Self::unary`] for more information on null handling.
774806
///
775-
/// This will apply the function for all values, including those on null slots.
776-
/// This implies that the operation must be infallible for any value of the corresponding type
777-
/// or this function may panic.
778807
/// # Example
808+
///
779809
/// ```rust
780810
/// # use arrow_array::{Int32Array, types::Int32Type};
781-
/// # fn main() {
782811
/// let array = Int32Array::from(vec![Some(5), Some(7), None]);
812+
/// // Apply x*2+1 to the data in place, no allocations
783813
/// let c = array.unary_mut(|x| x * 2 + 1).unwrap();
784814
/// assert_eq!(c, Int32Array::from(vec![Some(11), Some(15), None]));
785-
/// # }
786815
/// ```
816+
///
817+
/// # Example: modify [`ArrayRef`] in place, if not shared
818+
///
819+
/// It is also possible to modify an [`ArrayRef`] if there are no other
820+
/// references to the underlying buffer.
821+
///
822+
/// ```rust
823+
/// # use std::sync::Arc;
824+
/// # use arrow_array::{Array, cast::AsArray, ArrayRef, Int32Array, PrimitiveArray, types::Int32Type};
825+
/// # let array: ArrayRef = Arc::new(Int32Array::from(vec![Some(5), Some(7), None]));
826+
/// // Convert to Int32Array (panic's if array.data_type is not Int32)
827+
/// let a = array.as_primitive::<Int32Type>().clone();
828+
/// // Try to apply x*2+1 to the data in place, fails because array is still shared
829+
/// a.unary_mut(|x| x * 2 + 1).unwrap_err();
830+
/// // Try again, this time dropping the last remaining reference
831+
/// let a = array.as_primitive::<Int32Type>().clone();
832+
/// drop(array);
833+
/// // Now we can apply the operation in place
834+
/// let c = a.unary_mut(|x| x * 2 + 1).unwrap();
835+
/// assert_eq!(c, Int32Array::from(vec![Some(11), Some(15), None]));
836+
/// ```
837+
787838
pub fn unary_mut<F>(self, op: F) -> Result<PrimitiveArray<T>, PrimitiveArray<T>>
788839
where
789840
F: Fn(T::Native) -> T::Native,
@@ -796,11 +847,12 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
796847
Ok(builder.finish())
797848
}
798849

799-
/// Applies a unary and fallible function to all valid values in a primitive array
850+
/// Applies a unary fallible function to all valid values in a primitive
851+
/// array, producing a new array of potentially different type.
800852
///
801-
/// This is unlike [`Self::unary`] which will apply an infallible function to all rows
802-
/// regardless of validity, in many cases this will be significantly faster and should
803-
/// be preferred if `op` is infallible.
853+
/// Applies `op` to only rows that are valid, which is often significantly
854+
/// slower than [`Self::unary`], which should be preferred if `op` is
855+
/// fallible.
804856
///
805857
/// Note: LLVM is currently unable to effectively vectorize fallible operations
806858
pub fn try_unary<F, O, E>(&self, op: F) -> Result<PrimitiveArray<O>, E>
@@ -829,13 +881,16 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
829881
Ok(PrimitiveArray::new(values, nulls))
830882
}
831883

832-
/// Applies an unary and fallible function to all valid values in a mutable primitive array.
833-
/// Mutable primitive array means that the buffer is not shared with other arrays.
834-
/// As a result, this mutates the buffer directly without allocating new buffer.
884+
/// Applies a unary fallible function to all valid values in a mutable
885+
/// primitive array.
886+
///
887+
/// # Null Handling
888+
///
889+
/// See [`Self::try_unary`] for more information on null handling.
890+
///
891+
/// # Buffer Reuse
835892
///
836-
/// This is unlike [`Self::unary_mut`] which will apply an infallible function to all rows
837-
/// regardless of validity, in many cases this will be significantly faster and should
838-
/// be preferred if `op` is infallible.
893+
/// See [`Self::unary_mut`] for more information on buffer reuse.
839894
///
840895
/// This returns an `Err` when the input array is shared buffer with other
841896
/// array. In the case, returned `Err` wraps input array. If the function
@@ -870,9 +925,9 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
870925

871926
/// Applies a unary and nullable function to all valid values in a primitive array
872927
///
873-
/// This is unlike [`Self::unary`] which will apply an infallible function to all rows
874-
/// regardless of validity, in many cases this will be significantly faster and should
875-
/// be preferred if `op` is infallible.
928+
/// Applies `op` to only rows that are valid, which is often significantly
929+
/// slower than [`Self::unary`], which should be preferred if `op` is
930+
/// fallible.
876931
///
877932
/// Note: LLVM is currently unable to effectively vectorize fallible operations
878933
pub fn unary_opt<F, O>(&self, op: F) -> PrimitiveArray<O>
@@ -915,8 +970,16 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
915970
PrimitiveArray::new(values, Some(nulls))
916971
}
917972

918-
/// Returns `PrimitiveBuilder` of this primitive array for mutating its values if the underlying
919-
/// data buffer is not shared by others.
973+
/// Returns a `PrimitiveBuilder` for this array, suitable for mutating values
974+
/// in place.
975+
///
976+
/// # Buffer Reuse
977+
///
978+
/// If the underlying data buffer has no other outstanding references, the
979+
/// buffer is used without copying.
980+
///
981+
/// If the underlying data buffer does have outstanding references, returns
982+
/// `Err(self)`
920983
pub fn into_builder(self) -> Result<PrimitiveBuilder<T>, Self> {
921984
let len = self.len();
922985
let data = self.into_data();

arrow-array/src/types.rs

+4-2
Original file line numberDiff line numberDiff line change
@@ -47,9 +47,11 @@ impl BooleanType {
4747
pub const DATA_TYPE: DataType = DataType::Boolean;
4848
}
4949

50-
/// Trait bridging the dynamic-typed nature of Arrow (via [`DataType`]) with the
51-
/// static-typed nature of rust types ([`ArrowNativeType`]) for all types that implement [`ArrowNativeType`].
50+
/// Trait for [primitive values], bridging the dynamic-typed nature of Arrow
51+
/// (via [`DataType`]) with the static-typed nature of rust types
52+
/// ([`ArrowNativeType`]) for all types that implement [`ArrowNativeType`].
5253
///
54+
/// [primitive values]: https://arrow.apache.org/docs/format/Columnar.html#fixed-size-primitive-layout
5355
/// [`ArrowNativeType`]: arrow_buffer::ArrowNativeType
5456
pub trait ArrowPrimitiveType: primitive::PrimitiveTypeSealed + 'static {
5557
/// Corresponding Rust native type for the primitive type.

arrow-buffer/src/native.rs

+7-4
Original file line numberDiff line numberDiff line change
@@ -22,11 +22,14 @@ mod private {
2222
pub trait Sealed {}
2323
}
2424

25-
/// Trait expressing a Rust type that has the same in-memory representation
26-
/// as Arrow. This includes `i16`, `f32`, but excludes `bool` (which in arrow is represented in bits).
25+
/// Trait expressing a Rust type that has the same in-memory representation as
26+
/// Arrow.
2727
///
28-
/// In little endian machines, types that implement [`ArrowNativeType`] can be memcopied to arrow buffers
29-
/// as is.
28+
/// This includes `i16`, `f32`, but excludes `bool` (which in arrow is
29+
/// represented in bits).
30+
///
31+
/// In little endian machines, types that implement [`ArrowNativeType`] can be
32+
/// memcopied to arrow buffers as is.
3033
///
3134
/// # Transmute Safety
3235
///

0 commit comments

Comments
 (0)