-
Notifications
You must be signed in to change notification settings - Fork 20
ProposedChanges
ag's Notes: These are the notes that dubiousjim sent me about his proposed changes to the standard Pure library, which can be found at https://github.com/dubiousjim/unspoiled. (Please note that the commits in dubiousjim's repo are relative to a revision of the Pure sources which is several months old, so they won't apply cleanly to the current trunk. But I plan to merge at least some of these changes back into the current version.)
There are some nice additions in there, such as a coarse-grained equality predicate
==?
implementing Baker's EGAL, an alternative implementation of the dictionary and set data types based on 2-3 trees, an implementation of weak references, optimizations of existing library functions, as well as a bunch of useful new functions operating on lists and other containers. So integrating this stuff into the standard library should be a worthwhile endeavour.
On the other hand, many of the changes are also fairly disruptive, and while dubiousjim has put considerable effort into maintaining backward compatibility, there might be bugs and incompatibilities with existing code. This means that integrating the changes will take time and effort, in order not to break existing functionality in the standard library which is fairly mature and well-tested at this point. The purpose of this page is to provide some materials so that we can begin discussing and implementing these changes.
by dubiousjim
Consider the two lists:
let cell = ref 0;
[set[1,2,3,4],cell], [set[4,3,2,1],cell];
Perhaps an odd combination, but this demonstrates the issue in a concise
way. There's a natural sense in which these lists are equivalent, since the two
sets are equivalent and the two lists contain physically the same reference
cell. However, none of Pure's existing predicates express that equivalence.
Comparing the lists with ==
will not fully reduce, since ==
is not defined
on reference cells (not even when a reference cell is compared to itself).
Comparing the lists with ===
will return false, since although the sets
are ==
, their backend implementing trees have different structure.
We are adding to Pure a new pair of predicates, ==?
and ~=?
, that express
the kind of equivalence exhibited by these lists. You can think of ===
as
expressing fine-grained structural equivalence, and ==?
as expressing an
equivalence that may be coarser-grained in ways the user specifies, as we do
for sets. Roughly speaking, ==?
tests whether ==
is defined on its
operands, and if so, returns that result; else it falls back to whatever ===
returns. (This is only "roughly speaking" because ==?
tries to recurse
intelligently.) Unlike ==
, ==?
is totally defined. Unlike ===
, you can
override ==?
when introducing new types: you do so by implementing a
definition of ==
for those types.
It's natural to wonder, why not just make this the behavior of ==
? We could
define reference cells to be ==
when they are ===
, and we could work out a
way to make ==
totally defined while still providing hooks for the user to
override it for new types. That would also be a possible strategy, but Albert
presented some good reasons for not doing things that way. In ordinary cases,
though, you needn't confront any three-way choice between ===
, ==?
and
==
. If you want fine-grained structural equality, use ===
. If you want a
coarser-grained equality that counts set[1,2,3,4]
and set[4,3,2,1]
as
equivalent, use ==?
. You can use ==
instead if you know that ==
will be
defined for all of your object's components. But ==?
should agree with ==
in all cases where they're both defined.
Pure has a member
function for its collections like sets. But for some time
it has omitted any definition of member
for lists, largely because it wasn't
clear what equality predicate would be appropriate. If you wanted to use ===
,
you could write any (value===) xs
; if you wanted to use ==
, you'd write
any (value==) xs
, or piggyback onto index
, which was defined using ==
.
Now that ==?
is available, we think it does now make sense to provide
member
for lists, defined in terms of ==?
.
We're also redefining index
in terms of ==?
as well. (This should not break
any code unless it relied on index
failing to reduce in some cases.)
If you hate the identifier ==?
and want to advocate using equal
or
something else as an infix operator, now is the time to say so, before this new
predicate gets widely deployed.
We're also providing a delete
function for lists, similar to the existing
delete
function for collections like sets. The signature of this function is:
delete xs value
and it will return a version of xs that omits its first element (if any) that
satisfies (value==?). This behavior is also expressible in terms of other
list functions, for example, using the new rmfirstby
function defined
below it's:
rmfirstby (value==?) xs
But we thought it'd be useful to provide the specific behavior of delete
as a
separate function, especially since this is already provided for other
collections.
We scratched our heads for a while over an apparent conflict in the signature patterns for lists and collections: in list functions, the list tends to come last, whereas in collection function, the collection tends to come first. Then we realized that there is in fact a consistent pattern to Pure's existing functions, which all the new functions we're now adding can also conform to. Here is what the pattern looks like, abstractly:
operation function_or_predicate seed_or_base aggregate key_or_value
No Pure function exhibits all of these elements at the same time, but here are how several functions exhibit pieces of it:
foldl f a xs
filter p xs
index xs x
delete xs x
It's just an accident of which functions we've had available for lists, and which functions for collections, that made it look like lists would always come last and collections always come first.
Pure has the functions take n xs
and drop n xs
. Sometimes it's useful to
have both of these results at once, and it's natural to retrieve that pair of
results using only a single traversal of the list. We offer this function as
split n xs
. Other languages use names like splitAt
, split-at
, and
stream-split
here.
This pattern is useful in some other cases as well. So we offer the function
span p xs
to return the pair of takewhile p xs
and dropwhile p xs
, more
efficiently than if you called both of those functions explicitly. And we offer
the function partition p xs
to return the pair of filter p xs
and filter ((~).p) xs
.
filter p xs
gives you all the members of the list that satisfy p
.
Sometimes it's useful to retrieve only the first member of the list that does
so. One way to achieve that is by using dropwhile ((~).p) xs
, and then check
whether the resulting list has a head. When the predicate p
is just checking
for equality to a specific value, you could also use index
or member
. But
it is useful to have a version of this that works for general predicates. On
occasion, it's also useful to retrieve the last member of the list that
satisfies the predicate (and to do so more efficiently than by reversing the
list and retrieving the first). So we provide two new groups of functions:
-
firstby p default xs returns the first x that satisfies p, or else default
-
rmfirstby p xs returns xs with the first x that satisfies p (if any) removed
-
pickfirstby p xs returns the pair of the previous two expressions, or else () if no x satisfies p
-
lastby p default xs returns the last x that satisfies p, or else default
-
rmlastby p xs returns xs with the last x that satisfies p (if any) removed
-
picklastby p xs returns the pair of the previous two expressions, or else () if no x satisfies p
You can think of firstby
and rmfirstby
as being versions of the first
and
rmfirst
function for ordered collections, that operate with a predicate
rather than a specific position in the collection. Haskell also uses by
with
something roughly akin to this meaning.
A variety of different signatures are possible for these functions. Here are some alternatives:
- Instead of (),
pickfirstby p xs
might return the original xs in the case that no x satisfies p - Or it might take a
default
argument likefirstby
does - Or both of these functions might omit the default argument and throw an exception
- Or
firstby
might omit the default argument, and return either {| x |} or {}, where {| |} is the recently-introduced nonsplicing vector constructor. In this case,pickfirstby p xs
would return either {| x, newxs |} or {}. Here{}
plays the role of Haskell'sNothing
, and any non-empty vector plays the role of Haskell'sJust
.
If we chose the last strategy, that would naturally invite returning
multiple values from split
and span
and partition
using vectors instead
of (,)-separated tuples, as well. That has some advantages and some
disadvantages. For the time being, we're favoring the more conservative
approach of using (,)-separated tuples throughout. But we welcome your feedback
on this, and also on the question of whether any of the other alternative
signatures listed for firstby
and pickfirstby
look more attractive to you.
We surveyed a broad range of names for these functions, including alternatives
like find
, find_last
, pop
, remove
, and so on. Many of the alternatives
look better in some uses but are hard to fit into a satisfying pattern,
while also respecting other names already present in Pure. If you have opinions
about the names proposed here, we'd be glad to hear them. We don't think the
names proposed here are far-and-away superior to all natural alternatives; but we
have considered many such, and we do have reasons for preferring these.
One easy switch, though, would be to use findfirstby
instead of firstby
.
We provide a few curried nonsplicing vector and nonsplicing tuple constructors:
vector1 x = {| x |};
vector2 x y = {| x, y |};
vector3 x y z = {| x, y, z |};
tuple2 x y = '(x, y);
tuple3 x y z = '(x, y, z);
Observe that x,()
reduces to x
, and (x,y),z
reduces to x,(y,z)
; but
tuple2 x ()
is x,()
and tuple2 (x,y) z
is (x,y),z
. Suppressing the
natural splicing behavior of (,) in this way is useful for some purposes, such
as the defintions of zip
(see item 6, below) and the pick* functions.
Many list functions have been moved from prelude.pure into a new file
lists.pure, which the prelude automatically includes. In addition to the
functions listed above (member
, delete
, split
, span
, partition
, the
firstby/rmfirstby/pickfirstby triad and the corresponding triad for last), we
have also added or modified the following list functions:
-
foldl2, foldl3, foldr2, foldr3 fold 2 or 3 lists in parallel, stopping with the shortest
-
catstream [xs, ys, ...] efficient version of
cat [stream xs, stream ys, ...]
: you might prefer this toxs+ys
orcat [xs, ys]
whenxs
isn't already a stream Python calls this functionchain
-
rotate xs efficient version of
last xs:init xs
, providing both results from a single traversal -
zip xs ys Now joins the head of the two lists using (nonsplicing) tuple2, rather than (,). Formerly: zip [(1,2),()] [3,4] would return
[(1,(2,3)), 4]
which couldn't be unzipped without error. And disregarding the final element, even if we could unzip it we wouldn't get our original lists back, since(1,2),3
has been flattened into1,(2,3)
. Now zipping those two lists will instead return[((1,2),3), ((),4)]
, and unzipping that result will return exactly the original lists.
The same changes have been made for zip3. If you want a version of zip that
uses vectors instead of (,)-tuples, you can use: zipwith vector2 xs ys
or
zipwith3 vector3 xs ys zs`.
- reverse_onto, map_onto, catmap_onto,
- zip_onto, zip3_onto, zipwith_onto, zipwith3_onto
These are most easily explained with some examples:
reverse [1,2,3] // evaluates to [3,2,1]
reverse_onto [] [1,2,3] // evaluates to [3,2,1]
reverse_onto [40,50] [1,2,3] // evaluates to [3,2,1,40,50]
map succ [1,2,3] // evaluates to [2,3,4]
map_onto succ [] [1,2,3] // evaluates to [2,3,4]
map_onto succ [40,50] [1,2,3] // evaluates to [2,3,4,40,50]
As per Pure's standing policy, all of these functions are also implemented for strings.
Some new conditional compilation flags have been introduced.
If you run Pure with --enable=list-opt
, then many of the existing and new
list functions will switch to the skip-ahead implementations Dubiousjim
described in an earlier message to the mailing list. For longer lists, these
implementations can be 15% or so faster, and use substantially less memory,
which still being safe from stack overflow. By default, this flag is disabled,
and Pure uses the existing implementations, or implementations in the same
style, which achieve stack safety by accumulating the result in an auxiliary
list that gets reversed at the end.
If you run Pure with --enable=trees23
, then Dubiousjim's balanced tree
implementation is used instead of the older avltrees. Again, this can
be modestly faster and use less memory. By default, this flag is
disabled, and Pure still uses avltrees.
These flags should only affect performance. The semantics of your programs shouldn't be affected. If you find evidence to the contrary, please let us know about it.
Some new functions have been added to the collections libraries.
It was already the case that list (myset)
would convert a set (or a bag or
dict) into a list. Now one can also do stream (myset)
to convert the
collection directly into a stream. (No list intermediary needs to be built.)
Additionally, foldl
, foldl1
, foldr
, and foldr1
have been implemented
for all collections.
Additionally the following have been implemented for various collections:
-
get, get_last For bags and mdicts, where (!k) currently returns all values associated with that key, these will instead return the first-added value (which is also listed first in the enumeration) and the last-added value (listed last in the enumeration)
-
delete_last For bags and mdicts, where
delete ... k
anddelete_all ... k
currently delete the first-added value, or all values, associated with that key -
pick This complements the existing functions (!) and
delete
. For bags and mdicts it complementsget
rather than (!) -
pick_all, pick_last These complement (!) and
get_last
for bags and mdicts -
pick_val This complements the existing function delete_val for dicts
-
pickfirst/picklast Unlike pick and pickfirstby, these don't take a key or predicate, but just return the first member of an ordered collection, paired with the rest of the collection. They complement the existing functions first/last, rmfirst/rmlast
-
filter
-
partition
-
any/all
We would eventually like to port other functionality from the lists library to arrays, including:
- firstby/rmfirstby/pickfirstby, and likewise for last
- take/drop/split
- takewhile/dropwhile/span
- zipwith*, zipwith3*, unzip, unzip3
- map*, and do/dowith/dowith3
- cat/(+)/catmap*
- reverse*
- the fold2 and fold3 functions
- rotn n::int _::array, where n>0 rotates to right, n<0 to left
- perhaps index/member/delete
- [scan and sort omitted for now]