Clojure functions and data structures for working with datasets that exceed available RAM.
The core primitives include:
-
External sorting - sorts an arbitrarily large collection in constant space, and returns a lazy seq of the results. Intermediate results are transparently spilled to disk (items serialized via Fressian).
-
ExternalSet
is a datatype representing a file-backed set data structure, implementingclojure.lang.IPersistentSet
. Values are transparently serialized via Fressian and stored in a Riffle. -
Operations are provided for set intersection, union, shuffling, and more.
See the docs for the full API.