Skip to content

Improve docs for Exprs and scalar functions #16036

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 30 additions & 26 deletions datafusion/expr/src/expr.rs
Original file line number Diff line number Diff line change
Expand Up @@ -312,35 +312,15 @@ pub enum Expr {
Negative(Box<Expr>),
/// Whether an expression is between a given range.
Between(Between),
/// The CASE expression is similar to a series of nested if/else and there are two forms that
/// can be used. The first form consists of a series of boolean "when" expressions with
/// corresponding "then" expressions, and an optional "else" expression.
///
/// ```text
/// CASE WHEN condition THEN result
/// [WHEN ...]
/// [ELSE result]
/// END
/// ```
///
/// The second form uses a base expression and then a series of "when" clauses that match on a
/// literal value.
///
/// ```text
/// CASE expression
/// WHEN value THEN result
/// [WHEN ...]
/// [ELSE result]
/// END
/// ```
/// A CASE expression (see docs on [`Case`])
Case(Case),
/// Casts the expression to a given type and will return a runtime error if the expression cannot be cast.
/// This expression is guaranteed to have a fixed type.
Cast(Cast),
/// Casts the expression to a given type and will return a null value if the expression cannot be cast.
/// This expression is guaranteed to have a fixed type.
TryCast(TryCast),
/// Represents the call of a scalar function with a set of arguments.
/// Call a scalar function with a set of arguments.
ScalarFunction(ScalarFunction),
/// Calls an aggregate function with arguments, and optional
/// `ORDER BY`, `FILTER`, `DISTINCT` and `NULL TREATMENT`.
Expand All @@ -349,7 +329,7 @@ pub enum Expr {
///
/// [`ExprFunctionExt`]: crate::expr_fn::ExprFunctionExt
AggregateFunction(AggregateFunction),
/// Represents the call of a window function with arguments.
/// Call a window function with a set of arguments.
WindowFunction(WindowFunction),
/// Returns whether the list contains the expr value.
InList(InList),
Expand Down Expand Up @@ -378,7 +358,7 @@ pub enum Expr {
/// A place holder for parameters in a prepared statement
/// (e.g. `$foo` or `$1`)
Placeholder(Placeholder),
/// A place holder which hold a reference to a qualified field
/// A placeholder which holds a reference to a qualified field
/// in the outer query, used for correlated sub queries.
OuterReferenceColumn(DataType, Column),
/// Unnest expression
Expand Down Expand Up @@ -551,6 +531,28 @@ impl Display for BinaryExpr {
}

/// CASE expression
///
/// The CASE expression is similar to a series of nested if/else and there are two forms that
/// can be used. The first form consists of a series of boolean "when" expressions with
/// corresponding "then" expressions, and an optional "else" expression.
///
/// ```text
/// CASE WHEN condition THEN result
/// [WHEN ...]
/// [ELSE result]
/// END
/// ```
///
/// The second form uses a base expression and then a series of "when" clauses that match on a
/// literal value.
///
/// ```text
/// CASE expression
/// WHEN value THEN result
/// [WHEN ...]
/// [ELSE result]
/// END
/// ```
#[derive(Clone, Debug, PartialEq, Eq, PartialOrd, Hash)]
pub struct Case {
/// Optional base expression that can be compared to literal values in the "when" expressions
Expand Down Expand Up @@ -631,7 +633,9 @@ impl Between {
}
}

/// ScalarFunction expression invokes a built-in scalar function
/// Invoke a [`ScalarUDF`] with a set of arguments
///
/// [`ScalarUDF`]: crate::ScalarUDF
#[derive(Clone, PartialEq, Eq, PartialOrd, Hash, Debug)]
pub struct ScalarFunction {
/// The function
Expand All @@ -648,7 +652,7 @@ impl ScalarFunction {
}

impl ScalarFunction {
/// Create a new ScalarFunction expression with a user-defined function (UDF)
/// Create a new `ScalarFunction` from a [`ScalarUDF`]
pub fn new_udf(udf: Arc<crate::ScalarUDF>, args: Vec<Expr>) -> Self {
Self { func: udf, args }
}
Expand Down
45 changes: 27 additions & 18 deletions datafusion/expr/src/udf.rs
Original file line number Diff line number Diff line change
Expand Up @@ -34,19 +34,19 @@ use std::sync::Arc;
///
/// A scalar function produces a single row output for each row of input. This
/// struct contains the information DataFusion needs to plan and invoke
/// functions you supply such name, type signature, return type, and actual
/// functions you supply such as name, type signature, return type, and actual
/// implementation.
///
/// 1. For simple use cases, use [`create_udf`] (examples in [`simple_udf.rs`]).
///
/// 2. For advanced use cases, use [`ScalarUDFImpl`] which provides full API
/// access (examples in [`advanced_udf.rs`]).
///
/// See [`Self::call`] to invoke a `ScalarUDF` with arguments.
/// See [`Self::call`] to create an `Expr` which invokes a `ScalarUDF` with arguments.
///
/// # API Note
///
/// This is a separate struct from `ScalarUDFImpl` to maintain backwards
/// This is a separate struct from [`ScalarUDFImpl`] to maintain backwards
/// compatibility with the older API.
///
/// [`create_udf`]: crate::expr_fn::create_udf
Expand Down Expand Up @@ -451,9 +451,9 @@ pub trait ScalarUDFImpl: Debug + Send + Sync {
///
/// # Notes
///
/// Most UDFs should implement [`Self::return_type`] and not this
/// function as the output type for most functions only depends on the types
/// of their inputs (e.g. `sqrt(f32)` is always `f32`).
/// Most UDFs should implement [`Self::return_type`] and not this function,
/// as the output type for most functions only depends on the types of their
/// inputs (e.g. `sqrt(f32)` is always `f32`).
///
/// This function can be used for more advanced cases such as:
///
Expand Down Expand Up @@ -547,13 +547,15 @@ pub trait ScalarUDFImpl: Debug + Send + Sync {
}

/// Returns true if some of this `exprs` subexpressions may not be evaluated
/// and thus any side effects (like divide by zero) may not be encountered
/// Setting this to true prevents certain optimizations such as common subexpression elimination
/// and thus any side effects (like divide by zero) may not be encountered.
///
/// Setting this to true prevents certain optimizations such as common
/// subexpression elimination
fn short_circuits(&self) -> bool {
false
}

/// Computes the output interval for a [`ScalarUDFImpl`], given the input
/// Computes the output [`Interval`] for a [`ScalarUDFImpl`], given the input
/// intervals.
///
/// # Parameters
Expand All @@ -569,9 +571,11 @@ pub trait ScalarUDFImpl: Debug + Send + Sync {
Interval::make_unbounded(&DataType::Null)
}

/// Updates bounds for child expressions, given a known interval for this
/// function. This is used to propagate constraints down through an expression
/// tree.
/// Updates bounds for child expressions, given a known [`Interval`]s for this
/// function.
///
/// This function is used to propagate constraints down through an
/// expression tree.
///
/// # Parameters
///
Expand Down Expand Up @@ -620,20 +624,25 @@ pub trait ScalarUDFImpl: Debug + Send + Sync {
}
}

/// Whether the function preserves lexicographical ordering based on the input ordering
/// Returns true if the function preserves lexicographical ordering based on
/// the input ordering.
///
/// For example, `concat(a || b)` preserves lexicographical ordering, but `abs(a)` does not.
fn preserves_lex_ordering(&self, _inputs: &[ExprProperties]) -> Result<bool> {
Ok(false)
}

/// Coerce arguments of a function call to types that the function can evaluate.
///
/// This function is only called if [`ScalarUDFImpl::signature`] returns [`crate::TypeSignature::UserDefined`]. Most
/// UDFs should return one of the other variants of `TypeSignature` which handle common
/// cases
/// This function is only called if [`ScalarUDFImpl::signature`] returns
/// [`crate::TypeSignature::UserDefined`]. Most UDFs should return one of
/// the other variants of [`TypeSignature`] which handle common cases.
///
/// See the [type coercion module](crate::type_coercion)
/// documentation for more details on type coercion
///
/// [`TypeSignature`]: crate::TypeSignature
///
/// For example, if your function requires a floating point arguments, but the user calls
/// it like `my_func(1::int)` (i.e. with `1` as an integer), coerce_types can return `[DataType::Float64]`
/// to ensure the argument is converted to `1::double`
Expand Down Expand Up @@ -677,8 +686,8 @@ pub trait ScalarUDFImpl: Debug + Send + Sync {

/// Returns the documentation for this Scalar UDF.
///
/// Documentation can be accessed programmatically as well as
/// generating publicly facing documentation.
/// Documentation can be accessed programmatically as well as generating
/// publicly facing documentation.
fn documentation(&self) -> Option<&Documentation> {
None
}
Expand Down
Loading