sections/discussion.tex

\cleardoublepage
\section{Discussion}\label{sec:discussion}

The following two subsections qualitatively review the contribution with respect to the stated goals, and expose limitations of the current implementation as well as discuss flaws inherent to the design.

\subsection{Advantages}

\paragraph{Expressive power.} The minimal code size shows the expressive power of the language and its standard library of immutable data structures.

\paragraph{Database as a value.} Being able to treat the entire database as an immutable value which can be passed around the program and queried locally is a powerful affordance which slashes incidental complexity.

\paragraph{Strong auditability.} Accreting assertions and retractions of facts in an immutable log gives complete auditability of all changes with no programmatic overhead.

\paragraph{Flexible data model.} Attribute-oriented data models can represent sparse data much more efficiently than traditional RDBMS, at the cost of frequent joins \cite{gobel2019optimising}. Together with first class transactions and a notion of memory over time, EAV+$t_x$+$t_v$ appears to be a promising data model for complex domains.

\paragraph{Simple bitemporality.} Relaxing requirements of e.g. complex temporal queries and instead focusing on auditability yields a simple yet useful design. The system defers answering hard questions about e.g. the handling of future-dated facts and efficient bitemporal indexing as they are not a core concern.

\paragraph{Library, not a framework.} Apart from enforcing data to be fully denormalized, the presented design places no boundaries on miscellaneous concerns or structure of the embedding application. A developer is free to compose this "data layer as a library" with any other libraries or frameworks as they see fit.

\paragraph{Easily extensible.} Customizing the behavior of the database is simple, as exemplified by the implementation of the \lisp{document} function which allows atomic ingestion of multiple facts related to an entity as arbitrarily nested documents. Another example would be the client side use of a single database to store both local \gls{UI} state and "global" facts from the server, the only change necessary is to filter out attributes with a \lisp{:local} namespace before transacting the change to the server.


\subsection{Limitations}

\paragraph{Query engine.} No negation, no aggregations, no pull, no multi-way joins, no destructuring -- the minimalist implementation of the query language presented in this thesis does not even deserve comparison to a proper Datalog.

\paragraph{No schema.} Determining if fact was superseded needs upfront specification of cardinality. A production system would need to define a schema and at least enforce cardinalities on each attribute, i.e., the number of values that can be considered "current" for an attribute, e.g. that a person can have \emph{many} phone numbers but only \emph{one} birthdate.

\paragraph{Coupled storage.} The design of the storage layer makes no provisions for offloading responsibilities to a generic storage backend or even durable persistence via local disks.

\paragraph{Coupled transactor.} Splitting out the transactor component as in Datomic, with the goal of increasing availability in a failover configuration, is not possible in this design.

\paragraph{Limited subscriptions.} Clients have to explicitly subscribe and manage the lifecycle of available publications, which requires duplicate, imperative, and error-prone programming effort. Another limitation of the implementation is that a client can only be subscribed to one query at a time. The mechanism to notify clients of updated facts, being critical for security and privacy, was designed ad-hoc and has not been tested thoroughly.

\paragraph{Time-aware queries.} A complex query might want to mix different timelines in single query, e.g. join previous values with current values. This is not possible in the simplistic presented design. Datalog, however, has been extended with temporal semantics in various works \cite{alvaro2010dedalus, aref2015design}.

\paragraph{Stand-alone queries.} The query engine provides no means for abstracting, parameterizing, and composing fragments of queries in a \emph{hygienic} way so that expected lexical scoping semantics of logic variables are observed.

\paragraph{Dynamic typing.} During implementation and refactoring of the prototype, a clear majority of bugs were caused by a mismatch of expected vs. actual structure of data types being passed around. While the highly dynamic nature of the language enables extremely quick experimentation, a small number of functions would have benefitted from a lightweight layer of gradual typing, as afforded by e.g. \lisp{clojure.spec} \cite{pinzaru2019towards}.

\paragraph{Incompatibility.} Simultaneously changing the data model, its representation and storage formats, the interfaces and protocols, query languages and semantics requires doing away with proliferated standards, decades of established research, and proven implementations of the real world.

\paragraph{Switching costs.} Since persistence of data is fundamental to any application, changes to the data layer of existing systems are prohibitively expensive. Yet, the same dynamic proves advantageous in green field projects unencumbered by legacy decisions.

\paragraph{Privacy regulations.} No production system can be deployed without a way to permanently erase personal data. The concept of \emph{excision} as in Datomic, or the use of two separate data storage components with only hashes being written to immutable logs as in Crux present clean ways around this issue.

\paragraph{Open vs. closed system.} Incidental complexity within a closed system such as the one described in this work can be managed to stay at a minimal level because any outside components the system is interacting with is forced to adapt to the system's way. However, users of such a system eventually want to connect it to other things. Inevitably, doing so \emph{imports} some of the incidental complexities of those systems. It remains to be seen if open systems can be built without importing accidental complexity \cite{moffat16eve}.

\paragraph{Second system effect.} This work is the author's second attempt dealing with the problems of data layers for business applications. Brooks observed the tendency of successful systems to be succeeded by over-engineered, bloated systems, due to inflated expectations and overconfidence of the authors \cite{brooks1995mythical}.