Skip to content

Expose IntervalMonthDayNano and IntervalDayTime and update docs #5928

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 26, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 35 additions & 3 deletions arrow-array/src/array/primitive_array.rs
Original file line number Diff line number Diff line change
Expand Up @@ -351,19 +351,51 @@ pub type Time64MicrosecondArray = PrimitiveArray<Time64MicrosecondType>;
/// hold values such as `00:02:00.123456789`
pub type Time64NanosecondArray = PrimitiveArray<Time64NanosecondType>;

/// A [`PrimitiveArray`] of “calendar” intervals in months
/// A [`PrimitiveArray`] of “calendar” intervals in whole months
///
/// See [`IntervalYearMonthType`] for details on representation and caveats.
///
/// # Example
/// ```
/// # use arrow_array::IntervalYearMonthArray;
/// let array = IntervalYearMonthArray::from(vec![
/// 2, // 2 months
/// 25, // 2 years and 1 month
/// -1 // -1 months
/// ]);
/// ```
pub type IntervalYearMonthArray = PrimitiveArray<IntervalYearMonthType>;

/// A [`PrimitiveArray`] of “calendar” intervals in days and milliseconds
///
/// See [`IntervalDayTimeType`] for details on representation and caveats.
/// See [`IntervalDayTime`] for details on representation and caveats.
///
/// # Example
/// ```
/// # use arrow_array::IntervalDayTimeArray;
/// use arrow_array::types::IntervalDayTime;
/// let array = IntervalDayTimeArray::from(vec![
/// IntervalDayTime::new(1, 1000), // 1 day, 1000 milliseconds
/// IntervalDayTime::new(33, 0), // 33 days, 0 milliseconds
/// IntervalDayTime::new(0, 12 * 60 * 60 * 1000), // 0 days, 12 hours
/// ]);
/// ```
pub type IntervalDayTimeArray = PrimitiveArray<IntervalDayTimeType>;

/// A [`PrimitiveArray`] of “calendar” intervals in months, days, and nanoseconds.
///
/// See [`IntervalMonthDayNanoType`] for details on representation and caveats.
/// See [`IntervalMonthDayNano`] for details on representation and caveats.
///
/// # Example
/// ```
/// # use arrow_array::IntervalMonthDayNanoArray;
/// use arrow_array::types::IntervalMonthDayNano;
/// let array = IntervalMonthDayNanoArray::from(vec![
/// IntervalMonthDayNano::new(1, 2, 1000), // 1 month, 2 days, 1 nanosecond
/// IntervalMonthDayNano::new(12, 1, 0), // 12 months, 1 days, 0 nanoseconds
/// IntervalMonthDayNano::new(0, 0, 12 * 1000 * 1000), // 0 days, 12 milliseconds
/// ]);
/// ```
pub type IntervalMonthDayNanoArray = PrimitiveArray<IntervalMonthDayNanoType>;

/// A [`PrimitiveArray`] of elapsed durations in seconds
Expand Down
76 changes: 7 additions & 69 deletions arrow-array/src/types.rs
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ use crate::delta::{
use crate::temporal_conversions::as_datetime_with_timezone;
use crate::timezone::Tz;
use crate::{ArrowNativeTypeOp, OffsetSizeTrait};
use arrow_buffer::{i256, Buffer, IntervalDayTime, IntervalMonthDayNano, OffsetBuffer};
use arrow_buffer::{i256, Buffer, OffsetBuffer};
use arrow_data::decimal::{validate_decimal256_precision, validate_decimal_precision};
use arrow_data::{validate_binary_view, validate_string_view};
use arrow_schema::{
Expand All @@ -36,6 +36,9 @@ use std::fmt::Debug;
use std::marker::PhantomData;
use std::ops::{Add, Sub};

// re-export types so that they can be used without importing arrow_buffer explicitly
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the code change

pub use arrow_buffer::{IntervalDayTime, IntervalMonthDayNano};

// BooleanType is special: its bit-width is not the size of the primitive type, and its `index`
// operation assumes bit-packing.
/// A boolean datatype
Expand Down Expand Up @@ -218,84 +221,19 @@ make_type!(
IntervalYearMonthType,
i32,
DataType::Interval(IntervalUnit::YearMonth),
"A “calendar” interval stored as the number of whole months."
"A 32-bit “calendar” interval type representing the number of whole months."
);
make_type!(
IntervalDayTimeType,
IntervalDayTime,
DataType::Interval(IntervalUnit::DayTime),
r#"A “calendar” interval type in days and milliseconds.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved these docs to the new structured type as I think that is easier to find rather than on the IntervalDayTimeType defined to satisfy the arrow trait bounds

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do we re-export i256, we should probably do this in the same place/manner

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears to be in https://github.com/apache/arrow-rs/blob/cf59b6cd826412635dc391d4cf0f9d8310f5a226/arrow-buffer/src/lib.rs#L31-L30

Which is defined here: https://github.com/apache/arrow-rs/blob/ce1a8fc664620f127115601880c07ed7657fd2ca/arrow-buffer/src/bigint/mod.rs#L58-L57

I don't think there is any documentation about the type to move:

/// The decimal type for a Decimal256Array
#[derive(Debug)]
pub struct Decimal256Type {}

So I am not quite sure what else to do here


## Representation
This type is stored as a single 64 bit integer, interpreted as two i32 fields:
1. the number of elapsed days
2. The number of milliseconds (no leap seconds),

```text
┌──────────────┬──────────────┐
│ Days │ Milliseconds │
│ (32 bits) │ (32 bits) │
└──────────────┴──────────────┘
0 31 63 bit offset
```
Please see the [Arrow Spec](https://github.com/apache/arrow/blob/081b4022fe6f659d8765efc82b3f4787c5039e3c/format/Schema.fbs#L406-L408) for more details

## Note on Comparing and Ordering for Calendar Types

Values of `IntervalDayTimeType` are compared using their binary representation,
which can lead to surprising results. Please see the description of ordering on
[`IntervalMonthDayNanoType`] for more details
"#
"A “calendar” interval type representing days and milliseconds. See [`IntervalDayTime`] for more details."
);
make_type!(
IntervalMonthDayNanoType,
IntervalMonthDayNano,
DataType::Interval(IntervalUnit::MonthDayNano),
r#"A “calendar” interval type in months, days, and nanoseconds.

## Representation
This type is stored as a single 128 bit integer,
interpreted as three different signed integral fields:

1. The number of months (32 bits)
2. The number days (32 bits)
2. The number of nanoseconds (64 bits).

Nanoseconds does not allow for leap seconds.
Each field is independent (e.g. there is no constraint that the quantity of
nanoseconds represents less than a day's worth of time).

```text
┌───────────────┬─────────────┬─────────────────────────────┐
│ Months │ Days │ Nanos │
│ (32 bits) │ (32 bits) │ (64 bits) │
└───────────────┴─────────────┴─────────────────────────────┘
0 32 64 128 bit offset
```
Please see the [Arrow Spec](https://github.com/apache/arrow/blob/081b4022fe6f659d8765efc82b3f4787c5039e3c/format/Schema.fbs#L409-L415) for more details

## Note on Comparing and Ordering for Calendar Types
Values of `IntervalMonthDayNanoType` are compared using their binary representation,
which can lead to surprising results.

Spans of time measured in calendar units are not fixed in absolute size (e.g.
number of seconds) which makes defining comparisons and ordering non trivial.
For example `1 month` is 28 days for February but `1 month` is 31 days
in December.

This makes the seemingly simple operation of comparing two intervals
complicated in practice. For example is `1 month` more or less than `30 days`? The
answer depends on what month you are talking about.

This crate defines comparisons for calendar types using their binary
representation which is fast and efficient, but leads
to potentially surprising results.

For example a
`IntervalMonthDayNano` of `1 month` will compare as **greater** than a
`IntervalMonthDayNano` of `100 days` because the binary representation of `1 month`
is larger than the binary representation of 100 days.
"#
r"A “calendar” interval type representing months, days, and nanoseconds. See [`IntervalMonthDayNano`] for more details."
);
make_type!(
DurationSecondType,
Expand Down
70 changes: 70 additions & 0 deletions arrow-buffer/src/interval.rs
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,52 @@ use crate::arith::derive_arith;
use std::ops::Neg;

/// Value of an IntervalMonthDayNano array
///
/// ## Representation
///
/// This type is stored as a single 128 bit integer, interpreted as three
/// different signed integral fields:
///
/// 1. The number of months (32 bits)
/// 2. The number days (32 bits)
/// 2. The number of nanoseconds (64 bits).
///
/// Nanoseconds does not allow for leap seconds.
///
/// Each field is independent (e.g. there is no constraint that the quantity of
/// nanoseconds represents less than a day's worth of time).
///
/// ```text
/// ┌───────────────┬─────────────┬─────────────────────────────┐
/// │ Months │ Days │ Nanos │
/// │ (32 bits) │ (32 bits) │ (64 bits) │
/// └───────────────┴─────────────┴─────────────────────────────┘
/// 0 32 64 128 bit offset
/// ```
/// Please see the [Arrow Spec](https://github.com/apache/arrow/blob/081b4022fe6f659d8765efc82b3f4787c5039e3c/format/Schema.fbs#L409-L415) for more details
///
///## Note on Comparing and Ordering for Calendar Types
///
/// Values of `IntervalMonthDayNano` are compared using their binary
/// representation, which can lead to surprising results.
///
/// Spans of time measured in calendar units are not fixed in absolute size (e.g.
/// number of seconds) which makes defining comparisons and ordering non trivial.
/// For example `1 month` is 28 days for February but `1 month` is 31 days
/// in December.
///
/// This makes the seemingly simple operation of comparing two intervals
/// complicated in practice. For example is `1 month` more or less than `30
/// days`? The answer depends on what month you are talking about.
///
/// This crate defines comparisons for calendar types using their binary
/// representation which is fast and efficient, but leads
/// to potentially surprising results.
///
/// For example a
/// `IntervalMonthDayNano` of `1 month` will compare as **greater** than a
/// `IntervalMonthDayNano` of `100 days` because the binary representation of `1 month`
/// is larger than the binary representation of 100 days.
#[derive(Debug, Default, Copy, Clone, Eq, PartialEq, Hash, Ord, PartialOrd)]
#[repr(C)]
pub struct IntervalMonthDayNano {
Expand Down Expand Up @@ -272,6 +318,30 @@ derive_arith!(
);

/// Value of an IntervalDayTime array
///
/// ## Representation
///
/// This type is stored as a single 64 bit integer, interpreted as two i32
/// fields:
///
/// 1. the number of elapsed days
/// 2. The number of milliseconds (no leap seconds),
///
/// ```text
/// ┌──────────────┬──────────────┐
/// │ Days │ Milliseconds │
/// │ (32 bits) │ (32 bits) │
/// └──────────────┴──────────────┘
/// 0 31 63 bit offset
/// ```
///
/// Please see the [Arrow Spec](https://github.com/apache/arrow/blob/081b4022fe6f659d8765efc82b3f4787c5039e3c/format/Schema.fbs#L406-L408) for more details
///
/// ## Note on Comparing and Ordering for Calendar Types
///
/// Values of `IntervalDayTime` are compared using their binary representation,
/// which can lead to surprising results. Please see the description of ordering on
/// [`IntervalMonthDayNano`] for more details
#[derive(Debug, Default, Copy, Clone, Eq, PartialEq, Hash, Ord, PartialOrd)]
#[repr(C)]
pub struct IntervalDayTime {
Expand Down
Loading