Skip to content

Commit

Permalink
Final changes
Browse files Browse the repository at this point in the history
  • Loading branch information
albertzak committed May 31, 2020
1 parent d3a7d1e commit 3495b31
Show file tree
Hide file tree
Showing 8 changed files with 12 additions and 11 deletions.
1 change: 1 addition & 0 deletions glossary.tex
Original file line number Diff line number Diff line change
Expand Up @@ -36,3 +36,4 @@
\newacronym{UUID}{UUID}{Universally Unique Identifier}
\newacronym{ECS}{ECS}{Entity-Component Systems}
\newacronym{RPC}{RPC}{Remote Procedure Call}
\newacronym{PACELC}{PACELC}{Partitions: Availability or Consistency, Else Latency or Consistency}
2 changes: 1 addition & 1 deletion sections/conclusion.tex
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ \section{Future Work}
\paragraph{Safe concurrent editing.}
A distributed system expects connection loss and simultaneous conflicting edits. Especially when distributing state not just within the data center, but towards clients which thereby become (limited) database peers and are free to perform queries and writes at any time, the question arises of how such distribution can be achieved safely over (very) unreliable networks with partitions lasting for e.g. days. While theorems such as CAP and PACELC fundamentally stake out the playing field, common current data layers preempt the designer's choices and appear to make austere and conservative decisions in favor of consistency requiring coordination where not strictly needed, thus unnecessarily limiting availability.

Conflict-free replicated data types (CRDTs) and \gls{OT} are concepts which appear to provide various composable consistency primitives for robust replication \cite{weilbach2015replikativ, weilbach2016decoupling}, generally trading space efficiency for the ability to perform arbitrary \emph{merges} via an idempotent, associative, and commutative function. The result is that partitioned nodes can safely to continue to perform changes while guaranteeing that the resulting states can be reconciled later. A variety of commonly used data types have been implemented in a conflict-free variant. Since some of them may grow extremely quickly compared to their unsafe twins, garbage collection and epochal compaction of such data types remain areas of active research.
\gls{CRDT} and \gls{OT} are concepts which appear to provide various composable consistency primitives for robust replication \cite{weilbach2015replikativ, weilbach2016decoupling}, generally trading space efficiency for the ability to perform arbitrary \emph{merges} via an idempotent, associative, and commutative function. The result is that partitioned nodes can safely to continue to perform changes while guaranteeing that the resulting states can be reconciled later. A variety of commonly used data types have been implemented in a conflict-free variant. Since some of them may grow extremely quickly compared to their unsafe twins, garbage collection and epochal compaction of such data types remain areas of active research.

In a system like the one presented in this work, it should at least be possible to define a schema that specifies a distribution tradeoff for each attribute as dictated by the \gls{CAP} theorem. These constraints must propagate to the clients and elicit the data layer to restrict the possible operations on the data item in question given the current network conditions \cite{emerick2014api}.

Expand Down
2 changes: 1 addition & 1 deletion sections/design.tex
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,6 @@ \subsection{Query language}\label{sec:query_language}

\paragraph{Publication and subscription}

One of the goals states that clients should be able to declaratively subscribe to the \emph{live result set} of a query. The results and the query itself will change over the duration of a client's session. Each change triggers an immediate re-render of the UI. Conceptually, clients \emph{install} their \emph{subscription queries} on the server, and the infrastructure will re-run the subscription query whenever the underlying data changes and notify the client of the changed results. The design does not prescribe whether or not to replicate past (i.e. superseded or retracted) facts, thus greatly simplifying the proof-of-concept implementation by deferring concerns such as diffing, authorization, and the decision of what exactly to replicate to the clients to the developer customizing this data layer to their use case.
One of the goals states that clients should be able to declaratively subscribe to the \emph{live result set} of a query. The results and the query itself will change over the duration of a client's session. Each change triggers an immediate re-render of the UI. Conceptually, clients \emph{install} their \emph{subscription queries} on the server, and the infrastructure will re-run the subscription query whenever the underlying data changes and notify the client of the changed results. The design does not prescribe whether or not to replicate past (i.e., superseded or retracted) facts, thus greatly simplifying the proof-of-concept implementation by deferring concerns such as diffing, authorization, and the decision of what exactly to replicate to the clients to the developer customizing this data layer to their use case.

\paragraph{Security.} While extreme dynamism may be warranted in a high-trust environment, a real-world application may interact with some malicious entities and thus needs a means to restrict queries on the server side. In a real-world application, clients would need to authenticate themselves and the server would authorize publication based on access rules. Yet, there is no simple way to statically analyze queries submitted by the client for safety properties, but the server can control which facts are allowed to be replicated to a client. A publication might, for example, choose to not replicate facts with specific attributes, or transform facts to censor parts of the value.
2 changes: 1 addition & 1 deletion sections/discussion.tex
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ \subsection{Limitations}

\paragraph{Query engine.} No negation, no aggregations, no pull, no multi-way joins, no destructuring -- the minimalist implementation of the query language presented in this thesis does not even deserve comparison to a proper Datalog.

\paragraph{No schema.} Determining if fact was superseded needs upfront specification of cardinality. A production system would need to define a schema and at least enforce cardinalities on each attribute, i.e. the number of values that can be considered "current" for an attribute, e.g. that a person can have \emph{many} phone numbers but only \emph{one} birthdate.
\paragraph{No schema.} Determining if fact was superseded needs upfront specification of cardinality. A production system would need to define a schema and at least enforce cardinalities on each attribute, i.e., the number of values that can be considered "current" for an attribute, e.g. that a person can have \emph{many} phone numbers but only \emph{one} birthdate.

\paragraph{Coupled storage.} The design of the storage layer makes no provisions for offloading responsibilities to a generic storage backend or even durable persistence via local disks.

Expand Down
4 changes: 2 additions & 2 deletions sections/implementation.tex
Original file line number Diff line number Diff line change
Expand Up @@ -143,7 +143,7 @@ \subsection{Writing}

\paragraph{ACID transactions.}
Thanks to Clojure's immutable data structures, implementing transactions that respect \gls{ACID} guarantees is almost trivial.
The pure function \lisp{(transact db transaction txe now)} takes a database value, a transaction vector, a new unique entity \lisp{txe} and the current time which is to be used as transaction time $t_x$. The \lisp{txe} value is usually either a globally incrementing number or a randomly generated string like a \gls{UUID} version 4. It is up to the impure calling code on the server to generate and supply these two values. The \lisp{transact} function returns a \emph{new} database value with the transaction committed, i.e. appended to the log and with the indices updated. Transacting is a nested multi-step process:
The pure function \lisp{(transact db transaction txe now)} takes a database value, a transaction vector, a new unique entity \lisp{txe} and the current time which is to be used as transaction time $t_x$. The \lisp{txe} value is usually either a globally incrementing number or a randomly generated string like a \gls{UUID} version 4. It is up to the impure calling code on the server to generate and supply these two values. The \lisp{transact} function returns a \emph{new} database value with the transaction committed, i.e., appended to the log and with the indices updated. Transacting is a nested multi-step process:

\begin{itemize}
\item \lisp{commit}: Generates another data structure which describes the commit about to happen. The same arguments as to \lisp{transact} are passed along, resulting in the commit data structure depicted in listing~\ref{lst:commit_example}:
Expand Down Expand Up @@ -297,7 +297,7 @@ \subsection{Querying}

\cleardoublepage
\paragraph{Step 3: Decide index to use.}
The query clauses are now parsed into a sieve which is ready to be evaluated over an index. Descending an index means successively going from knowledge to answer, i.e. evaluation needs to start with an index which has as its top level value one of the constants provided in the query. In other words, the lvar that appears at the same position within each query clause is the \emph{joining variable}. The index which to descend is the one that keeps this joining variable at its leaves \cite{rubin15aosadb}. This is a design limitation, as a full query language should support arbitrary transitive joins, leveraging multiple indices. Refer to table \ref{tbl:lvartoindex} for clarification.
The query clauses are now parsed into a sieve which is ready to be evaluated over an index. Descending an index means successively going from knowledge to answer, i.e., evaluation needs to start with an index which has as its top level value one of the constants provided in the query. In other words, the lvar that appears at the same position within each query clause is the \emph{joining variable}. The index which to descend is the one that keeps this joining variable at its leaves \cite{rubin15aosadb}. This is a design limitation, as a full query language should support arbitrary transitive joins, leveraging multiple indices. Refer to table \ref{tbl:lvartoindex} for clarification.

\begin{table}
\caption{Mapping of joining variable position to query index, slightly adapted from \cite{rubin15aosadb}}
Expand Down
4 changes: 2 additions & 2 deletions sections/introduction.tex
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,10 @@ \section{Introduction}
\cleardoublepage

\subsection{Problem}
Classic Relational Database Management Systems (RDBMS) are not suitable for modeling sparse, hierarchical or multi-valued data in domains where evolution and dynamism are hard requirements. They also lack a notion of memory or change over time because writes are destructive by default. Attempts to add concepts of temporality to Structured Query Language (SQL) are complicated and as a result not widely used. Auditing changes to a mutable black box with no history is hard. Developers fear the networking aspects of sending queries on a round trip "over there" \cite{hickey2012dbvalue} to the database, as opposed to having the data in memory and being able to query it directly. On top of all, customers demand increasingly interactive distributed collaboration environments, which pushes the limits of established request-response mechanisms.
Classic Relational Database Management Systems (RDBMS) are not suitable for modeling sparse, hierarchical or multi-valued data in domains where evolution and dynamism are hard requirements. They also lack a notion of memory or change over time because writes are destructive by default. Attempts to add concepts of temporality to Structured Query Language (SQL) are complicated and, as a result, not widely used. Auditing changes to a mutable black box with no history is hard. Developers fear the networking aspects of sending queries on a round trip "over there" \cite{hickey2012dbvalue} to the database, as opposed to having the data in memory and being able to query it directly. On top of all, customers demand increasingly interactive distributed collaboration environments, which pushes the limits of established request-response mechanisms.

\subsection{Contribution}
This thesis explains a possible design of a data layer for business applications and describes its prototypal implementation. The core is a simple relational data model based on facts in the form of Entity-Attribute-Value (EAV) tuples similar to the \gls{6NF} or the \gls{RDF} as seen in Semantic Web literature. Such EAV tuples are accreted in an append-only (immutable) log as assertions and retractions together with two timestamps: \gls{tx} at which the fact was added to the log, and \gls{tv} at which the fact became true in the wider context of the system \cite{snodgrass1992temporal} i.e. the real world.
This thesis explains a possible design of a data layer for business applications and describes its prototypal implementation. The core is a simple relational data model based on facts in the form of Entity-Attribute-Value (EAV) tuples similar to the \gls{6NF} or the \gls{RDF} as seen in Semantic Web literature. Such EAV tuples are accreted in an append-only (immutable) log as assertions and retractions together with two timestamps: \gls{tx} at which the fact was added to the log, and \gls{tv} at which the fact became true in the wider context of the system \cite{snodgrass1992temporal} i.e., the real world.


\paragraph{Goal.}
Expand Down
2 changes: 1 addition & 1 deletion sections/related_work.tex
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ \section{Related Work}\label{sec:related_work}
\cleardoublepage
Reads are by default served from the client's local copy of the data, giving an AP read path that may result in the reading of stale data but remains performant and available. The application developer can choose to apply writes either optimistically to the local database which Meteor later replicates to the server (that may issue corrections) thus yielding AP characteristics with last-write-wins pseudo resolution semantics (previous writes are lost because there is no concept of time); or let the server process the transaction via a Remote Procedure Call (RPC) and wait for a quorum of the database servers before confirming and replicating to the clients, thus yielding CP semantics.

The major problem of Meteor is its absolute dependence on MongoDB and the ensuing lack of a relational data model, i.e. no joins and requiring to denormalize data, leading to incidental complexity of the application code and the $n+1$ query problem. Its Minimongo implementation lacks indices and does not support all of MongoDB's query capabilities. Meteor's publication/subscription mechanism does not scale well to many clients or publication queries returning a large number of documents. This is because the server keeps a working copy of all documents present in each connected client's local database in memory and creates a diff for each client on each database update. As a result, Meteor's publication/subscription mechanism is stateful and requires sticky sessions, meaning that an upstream load balancer must take care to always forward a client's requests to the same backend server.
The major problem of Meteor is its absolute dependence on MongoDB and the ensuing lack of a relational data model, i.e., no joins and requiring to denormalize data, leading to incidental complexity of the application code and the $n+1$ query problem. Its Minimongo implementation lacks indices and does not support all of MongoDB's query capabilities. Meteor's publication/subscription mechanism does not scale well to many clients or publication queries returning a large number of documents. This is because the server keeps a working copy of all documents present in each connected client's local database in memory and creates a diff for each client on each database update. As a result, Meteor's publication/subscription mechanism is stateful and requires sticky sessions, meaning that an upstream load balancer must take care to always forward a client's requests to the same backend server.

Preliminary load testing and tracing of performance problems during typical workloads encountered in the author's medical information system concluded that performance of Meteor's publication/subscription system starts to degrade noticeably with more than 300 clients connected to one server process on moderate dedicated server hardware, or when a client subscribes to a single publication containing more than around 100,000 documents. Clients subscribing to such large collections has intermittently led to server processes permanently hanging at 100\% processor usage until forcibly restarted, an issue which has been known for at least two years\footnote{Meteor issue \#9796 on GitHub: \url{https://github.com/meteor/meteor/issues/9796}}. Despite these shortfalls, Meteor is production-grade software with extraordinary developer ergonomics and served as the backbone of the authors's business replicating terabytes of data extremely reliably since first put into production five years ago.

Expand Down
Loading

0 comments on commit 3495b31

Please sign in to comment.