-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathdesign_goals.tex
38 lines (23 loc) · 4.47 KB
/
design_goals.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
\subsection{Goals}\label{sec:goals}
To reduce some of the complexities in question, \cite{tarpit} recommend adopting functional and declarative programming constructs, along with a fully relational data model. The described system should provide a flexible \emph{data layer} for small to medium size internal business applications for distributed collaboration in near real time, optimized for developer productivity but still backed by relational guarantees.
\paragraph{Notion of memory.} The requirement of strict auditability of \emph{everything} implies keeping a queryable history of structured facts. An event sourced data model with explicit time and memory based on an accretion of facts fits this requirement. Especially when structured in 6NF/EAV, their combination affords a "save everything, query later" way of thinking by being access path independent.
\paragraph{Streaming relational queries.}
Recall the querying semantics for data at rest in listing~\ref{lst:querying_data_at_rest}. What is needed for rich interactions is a different model of streaming queries for data in motion: What is "just there" is not the data, but the query — which is \emph{instantiated in the network}, and the data flows through the query instead of the other way around. Listing~\ref{lst:querying_data_in_motion} gives an example of a streaming query.
\begin{lstlisting}[label={lst:querying_data_in_motion},morekeywords={email-source,contacts-source},caption=Querying data in motion \cite{alvaro2015isee}]
($\Pi$ [:ip]
($\bowtie$ ($\sigma$ (classifier email :spam)
email-source)
contacts-source))
\end{lstlisting}
\paragraph{Transactional guarantees.} A proper data layer for an information system must afford the same transactional guarantees of Atomicity/Consistency/Isolation/Durability (ACID) as classical relational databases.
\paragraph{Adaptivity and dynamism.} Yet, such a system must also be malleable, adaptive, and can not require developers to settle e.g. on a fixed database schema design upfront. Instead, the system must be able to quickly co-evolve with the domain it is serving.
\paragraph{Distribution.}
The handling of data between clients and servers should be as declarative as possible: Clients declare what data they are interested in, and the infrastructure takes care of fetching, caching, handling updates, and managing the lifecycle of the subscription. The same infrastructure must be written in a way such that it is able to run on servers, web browsers and mobile devices with no modification.
\subsection{Non-goals}\label{sec:nongoals}
\paragraph{Efficiency.}
No attention is paid to the efficiency of compute and memory usage. Tradeoffs are almost always made in favor of clarity concerning the mapping between conceptual model and implementation of the proof of concept. The only major optimization is the fact that the triple index structure exists, leastwise it doubles as the simplest possible way to access arbitrary data without the overhead of parsing and executing a query.
Custom indexing strategies, e.g. ways to maintain a phonetic index to query for people's names, do not need to be part of the database design, because the triple indexing scheme combined with the ability to derive facts (explained in section~\ref{sec:implementation}) is general enough to allow arbitrary access to the data in an efficient-enough manner without having to declare indices upfront.
\paragraph{First class bitemporality.}
Another non-goal is the efficiency, primacy, and expressive power of the provided bitemporal affordances. A vast amount of previous work exists on relatively complex attempts to add efficient bitemporal semantics to relational databases \cite{snodgrass1996adding,jensen1999temporal,kulkarni2012temporal} notably for use in bitemporal constraints \cite{doucet1997using} or within complex queries, and of bitemporality as a concept in production rule systems \cite{aref2015design}. Bitemporality in the described system is only secondary. Access paths are optimized for the most recent view of the data, while bitemporality is meant to be used for infrequent auditing purposes. There is no bitemporal index, consequently issuing queries with temporal modifiers is allowed to cause a sequential scan of the log.
\paragraph{Standard protocols.}
Lastly, the design explicitly avoids compliance with existing proliferated standards around the handling of data such as SQL, \gls{XML}, \gls{JSON}, \gls{REST}, etc. to instead allow quick exploration into different paradigms unburdened by past decisions.