Should Morel's collection types indicate that a collection is sorted, unique, or has repeatable iteration order? #232
Replies: 1 comment
-
Let's extend the proposal to address the remarks:
We propose that there is an
but its value is not constant. Its value changes from one row to the next, and you will get a compilation error if you access it from an invalid location (such as the
returns 6 as you would expect. And a record can have an
To implement To implement
If
Queries on unordered lists (
Zip join can be expressed by assigning
The compiler should be smart enough to use an O(n) join algorithm, similar to ListPair.zip. |
Beta Was this translation helpful? Give feedback.
-
Should Morel's collection types indicate that a collection is sorted, unique, or has repeatable iteration order? The relational model and functional programming languages both have collection types, but there is a mismatch. Relations are unordered (unless a query has an
ORDER BY
clause), programming language lists are ordered. Uniqueness is complicated (relations are unique in Datalog but not in SQL, and if you choose a unique collection such as Java'sHashSet
orTreeSet
it has implications for iteration order).We propose to add a
multiset
type to Morel, a new clauseunordered
. We do not propose to deal with uniqueness or sort order at this time.Specifically:
multiset
, an unordered equivalent oflist
.from
expression and ajoin
clause can iterate over bothlist
andmultiset
collections.from
expression is alist
if it ends withorder
followed by zero or moreyield
orwhere
clauses, or if it reads from alist
followed by by zero or moreyield
orwhere
clauses;multiset
otherwise.unordered
.Remarks:
list
,multiset
has alength
method that returns the number of elements. You should not assume that its running time is O(1).multiset
does not have thenth (l, i)
,take (l, i)
ordrop (l, i)
methods oflist
because thei
th element is not definedmultiset
. (Nullable columns becomeoptional
, since Morel has no null value.)order
clause returns an ordered collection even if the sort-keys are not exhaustive. In particular,order
with no arguments returns a list whose order is undefined but whose order will be the same each time the list is read.order
is stable. We will probably declare that it is not stable.list
does not make it deterministic. If you run it several times, the order of items in thelist
may change.Set
andSortedSet
, which do precisely that.) It is important to know if a collection's elements are unique or sorted, the key(s) or sort key of a collection, and if there are functional or referential dependencies, but this should be done using constraints rather than by the type system.UNNEST WITH ORDINAL
.)Examples of ordered and unordered queries.
Rationale
Let's review collection types in various languages.
Standard ML
Standard ML's main collection type is
list
. A list's iteration order is defined and is deterministic. If you add elements in a certain order, iterations will always appear in the same order. It's as if each item had a zero-based ordinal; it doesn't, but there is more information in a list than an unsorted collection containing the same elements.A list can contain duplicate elements.
Like all collections we will be discussing, lists are finite and immutable. Size can be determined in constant time.
There is also
[vector](https://smlfamily.github.io/Basis/vector.html)
, which is similar tolist
but whosesub (v, i)
function allows constant time access to thei
th element.Java
Java has the following collection types. All are finite and have a constant-time
size
operation. Each has mutable and immutable implementations.List
has a defined order and may contain duplicates.Set
does not have a defined order and may not contain duplicates.SortedSet
has a deterministic order (determined by its comparator) but the creator of theSortedSet
cannot dictate the order.Relational
In the academic relational model (and Datalog), relations do not contain duplicates, are finite and immutable, have no deterministic iteration order.
Size is not assumed to be constant-time (but that's an implementation concern, and there may be a materialized view or other data structure that computes size quickly).
Because iteration is not deterministic, there is no operation to get the
i
th element.The industrial relational model (used by SQL) is the same as the academic relational model, except that relations may contain duplicates.
Comparing the approaches
In the relational model, not everything we know about the data is encoded as part of the type. For example, given the relation employees (empno, ename, deptno, dname) we may know that
empno
is primary key, and thatdeptno
functionally determinesdname
. We know that this relation is unique (because of the primary key) but iteration order is still nondeterministic.In the relational model (including in the SQL standard), sorting is generally performed last, just before outputting to a screen. Iteration order remains unknown until that point. (There is one important exception: the
ARRAY
column type has a determined order.)In our proposal, a Morel query returns an ordered collection in similar circumstances to a relational query. A SQL query with an
ORDER BY
clause is sorted, and a Morel query ending withorder
returns a list.Relational systems make much use of constraints, and constraints are separate from schema (type system). Likewise, we intend to add constraints to Morel, and these will be used to represent sort-order, uniqueness and functional dependencies, and intra-row constraints (e.g.
job = 'SALES' orelse commission = 0)
. We intend to allow constraints to be explicitly declared, but will also be deduced (e.g. uniqueness following agroup
clause, or intra-row constraints following awhere
clause), may be checked at run time (using something likeassert
), and the programmer can check that they can be deduced (using aprove
keyword that is analogous toassert
but works at compile time).Beta Was this translation helpful? Give feedback.
All reactions