|
| 1 | +<!--- |
| 2 | + Licensed to the Apache Software Foundation (ASF) under one |
| 3 | + or more contributor license agreements. See the NOTICE file |
| 4 | + distributed with this work for additional information |
| 5 | + regarding copyright ownership. The ASF licenses this file |
| 6 | + to you under the Apache License, Version 2.0 (the |
| 7 | + "License"); you may not use this file except in compliance |
| 8 | + with the License. You may obtain a copy of the License at |
| 9 | +
|
| 10 | + http://www.apache.org/licenses/LICENSE-2.0 |
| 11 | +
|
| 12 | + Unless required by applicable law or agreed to in writing, |
| 13 | + software distributed under the License is distributed on an |
| 14 | + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| 15 | + KIND, either express or implied. See the License for the |
| 16 | + specific language governing permissions and limitations |
| 17 | + under the License. |
| 18 | +--> |
| 19 | + |
| 20 | +# Why Is List Item Always Nullable? |
| 21 | + |
| 22 | +## Motivation |
| 23 | + |
| 24 | +There were independent proposals to make the `nullable` setting of list |
| 25 | +items in accumulator state be configurable. This meant adding additional |
| 26 | +fields which captured the `nullable` setting from schema in planning for |
| 27 | +the first argument to the aggregation function, and the returned value. |
| 28 | + |
| 29 | +These fields were to be added to `StateFieldArgs`. But then we found out |
| 30 | +that aggregate computation does not depend on it, and it can be avoided. |
| 31 | + |
| 32 | +This document exists to make that reasoning explicit. |
| 33 | + |
| 34 | +## Background |
| 35 | + |
| 36 | +The list data type is used in the accumulator state for a few aggregate |
| 37 | +functions like: |
| 38 | + |
| 39 | +- `sum` |
| 40 | +- `count` |
| 41 | +- `array_agg` |
| 42 | +- `bit_and`, `bit_or` and `bit_xor` |
| 43 | +- `nth_value` |
| 44 | + |
| 45 | +In all of the above cases the data type of the list item is equivalent |
| 46 | +to either the first argument of the aggregate function or the returned |
| 47 | +value. |
| 48 | + |
| 49 | +For example, in `array_agg` the data type of item is equivalent to the |
| 50 | +first argument and the definition looks like this: |
| 51 | + |
| 52 | +```rust |
| 53 | +// `args` : `StateFieldArgs` |
| 54 | +// `input_type` : data type of the first argument |
| 55 | +let mut fields = vec![Field::new_list( |
| 56 | + format_state_name(self.name(), "nth_value"), |
| 57 | + Field::new("item", args.input_type.clone(), true /* nullable of list item */ ), |
| 58 | + false, // nullable of list itself |
| 59 | +)]; |
| 60 | +``` |
| 61 | + |
| 62 | +For all the aggregates listed above, the list item is always defined as |
| 63 | +nullable. |
| 64 | + |
| 65 | +## Computing Intermediate State |
| 66 | + |
| 67 | +By setting `nullable` (of list item) to be always `true` like this we |
| 68 | +ensure that the aggregate computation works even when nulls are |
| 69 | +present. The advantage of doing it this way is that it eliminates the |
| 70 | +need for additional code and special treatment of nulls in the |
| 71 | +accumulator state. |
| 72 | + |
| 73 | +## Nullable Of List Itself |
| 74 | + |
| 75 | +The `nullable` of list itself depends on the aggregate. In the case of |
| 76 | +`array_agg` the list is nullable(`true`), meanwhile for `sum` the list |
| 77 | +is not nullable(`false`). |
0 commit comments