Skip to content

Variable Ownership

Kinan Bab edited this page Apr 27, 2023 · 4 revisions

So far, we have some examples of data that has a fixed number of owners, where the number of associated data subjects are fixed and defined in the schema. For example, a chat message in a social network that has one sender and one receiver. Here, we look at an example where the number of owners is dynamic, and can change according to the data itself.

We will look at an example adopted from our ownCloud experiment, which we lightly edit for clarity.

This example revolves around a file sharing applications. Users in this application can upload files and share them with other users or with groups of users.

Let's start with the users table, which is similar to the ones we saw in previous examples, and insert some data into it.

CREATE DATA_SUBJECT TABLE users (
  ID INT,
  name TEXT,
  PRIMARY KEY(ID)
);

INSERT INTO users VALUES (1, 'Alice');
INSERT INTO users VALUES (2, 'Bob');

Groups and membership

In this application, users can create groups, and add or remove others to them. The group itself is simple and only contains an ID and a name.

-- groups is an SQL keyword, we use usergroups instead.
CREATE TABLE usergroups (
  ID INT,
  name TEXT,
  PRIMARY KEY(ID)
);

On the other hand, the association of users and groups is a classic many-to-many relationship. In SQL databases, this is often represented using an auxiliary members table with two foreign keys, one pointing towards a user, and another pointing towards a group the user is a member of. A user can have multiple rows in this auxiliary table, representing membership in many groups. Similarly, a group may have multiple rows when it has several members.

From a policy perspective, the records of this auxiliary table are owned by the users they correspond to. This is easy to express in K9db using OWNED_BY. Additionally, it is often a reasonable policy to consider a group to be co-owned by all its members, so that the group is only deleted when all its members request deletion. However, this policy cannot be represented using only OWNED_BY annotations: there is no outgoing foreign key in the usergroups table. Instead, the association is established via an incoming foreign key from the auxiliary members table.

Thus, K9db provides an OWNS annotation, which expresses an ownership association in the opposite direction of the foreign key it is applied to. Using OWNS, we can now specify the a row in the members table owns the group it points to. Since, the row in the members table is owned by the user it is associated with, this makes the user also own the target group transitively.

CREATE TABLE members (
  ID INT,
  user_id INT,
  group_id INT,
  PRIMARY KEY(ID),
  FOREIGN KEY (user_id) OWNED_BY users(ID),
  FOREIGN KEY (group_id) OWNS usergroups(ID)
);

Compliance transactions

Naturally, applications have to insert data into the usergroups table before they are able to insert corresponding data into the members table, due to the referential integrity of the group_id foreign key. In fact, K9db actively enforces this integrity and will error out when it is violated.

However, this results in a scenario where a newly inserted group is (perhaps temporarily) not associated with any users, and thus unowned. While future insertions may indeed introduces some members to that group, we cannot guarantee that these insertions will indeed occur or succeed, in which case the group remains unowned. Leaving data that is supposed to be owned by some data subjects in an orphaned state is problematic: this data cannot be deleted or retrieved via any SAR. In the case where such data contains sensitive information about some users, this violates compliance.

To protect against this, K9db checks that newly inserted data is associated with some users before committing the insert. It also checks that SQL operations, such as UPDATE or DELETE, do not create orphaned data, K9db will reject a DELETE operation against the members table that leaves one or more groups with no members (and thus no owners).

INSERT INTO usergroups VALUES (1, 'group 1');
> Produces a fatal error in the K9db server:
> C-Wrapper: INTERNAL: Orphaned data created outside of compliance transaction!

Developers can temporarily disable this check by using compliance transactions (CTX). Similar to an SQL transaction, a CTX allows statements to temporarily violate invariants, provided that they restore them at the end of the transaction. If a CTX remains in violation after it is finished, K9db will reject it and produce an error.

CTX START;
> Query OK, 0 rows affected (0.000 sec)

INSERT INTO usergroups VALUES (1, 'Group 1');
> Query OK, 1 row affected (0.008 sec)

INSERT INTO members VALUES (1, 1, 1);
> Query OK, 3 rows affected (0.004 sec)

CTX COMMIT;
> Query OK, 0 rows affected (0.000 sec)

Note that K9db does not support SQL transaction, and thus does not rollback the effects of a CTX. Instead, it produces an error notifying the developer of the violating statements and affected data, and relies on the developers resolving the issue themselves. Supporting SQL transactions and a more seamless rollback of CTX is a future enhancement.

CTX START;
> Query OK, 0 rows affected (0.000 sec)

INSERT INTO usergroups VALUES (2, 'Group 2');
> Query OK, 1 row affected (0.028 sec)

CTX COMMIT;
> K9db server crashes here producing the following error:
> C-Wrapper: INTERNAL: Orphaned data detected
> Developers need to restart K9db and issue a delete command removing the orphaned group

To reduce developer efforts, we provide a mechanism to automatically wrap sessions in compliance transactions. When this mechanism is used, all sessions automatically invoke CTX START when they are opened, and CTX COMMIT when they are closed. A session corresponds to a single active open connection to K9db. For example, everytime the mariadb client is used to connect to K9db from a different terminal, a new session is created, and the session is closed when the client is shutdown.

This may be suitable for applications where sessions are short lived, e.g. applications that open a new session for every application endpoint. On the other hand, this should not be used when sessions are long lived, e.g. applications that keep a pool of open sessions around and use them to handle any incoming end point, as such sessions may make a large number of updates to the database before closing (and thus committing the transaction), making recovering from a violation a lot harder.

-- All sessions will now be wrapped inside a compliance transaction
SET AUTO_CTX;

Files and sharing

Finally, users can upload files and share them with others either via a group or directly. ownCloud's policy gives the original creator of the file sole ownership of it, while only providing access rights to the users the file is shared with. This means that when the original creator requests deletion, the file and all records of it being shared with others are deleted. On the other hand, the file is preserved when some of the users it is shared with request deletion.

This is yet another example of a classic many to many relationship, respresented via a shares auxiliary table. However, instead of signifying ownership, this time, this relationship signifies access rights. K9db provides an ACCESSES annotation that mimics OWNS but exclusively confers access rights without impacting deletion.

Note that in ownCloud's schema, the original creator of the file is recorded in the shares table, instead of in the files table. We do the same in our schema to minimize changes to the application. This schema does not preclude a file from having several corresponding rows in shares, each specifying a different creator, and thus a different owner. Instead, this is an application level invariant that application code is responsible for respecting. In either cases, K9db will correctly maintain associations between files and their owners, and handle SARs accordingly.

CREATE TABLE files (
  ID INT,
  path TEXT,
  PRIMARY KEY (ID)
);

CREATE TABLE shares (
  ID INT,
  creator INT,
  share_with_user INT,
  share_with_group INT,
  PRIMARY KEY (ID),
  -- The creator owns the file.
  FOREIGN KEY (creator) OWNED_BY users(ID),
  -- Sharing with a user or a group only gives them access to the file
  FOREIGN KEY (share_with_user) ACCESSED_BY users(ID),
  FOREIGN KEY (share_with_group) ACCESSED_BY usergroups(ID)
);

Because the creator is defined inside the shares table, developers need to use compliance transactions when inserting a file and specifying its creator for the first time.

CTX START;
INSERT INTO files VALUES (1, 'file 1');
INSERT INTO shares VALUES (1, 1, NULL, NULL, 1);
CTX COMMIT;

-- Share file 1 with bob directly.
-- Application ensures creator is the same.
INSERT INTO shares VALUES (2, 1, 2, NULL, 1);

-- Share file 1 with members of group 1.
INSERT INTO shares VALUES (3, 1, NULL, 1, 1);

Alternative policy

An alternative policy may choose to grant ownership rights to all data subjects a file is associated with, either because they are the creator, the file is shared directly with them, or the file is shared with them via a group.

The primary metric for determining whether a policy is reasonable is whether or not it is compliant and reflects the spirit of the application. While this alternative policy may likely be compliant, it may be undesirable for many users, which may feel like they have lost control over their sensitive files.

Furthermore, this policy may be somewhat nonsensical depending on the application logic. For example, if the application only allows the creator to share or revoke the file with others. Then, the original policy of deleting the file when the creator is deleted is most consistent, since the creator is able to revoke the sharing with others anyways.

A secondary factor in deciding policy may be performance and complexity of reasoning about the policy. When using the alternative policy, K9db stores a copy of each row in files in the shards of every user the file is shared with. This may introduce undesirable storage overhead, and may slow down some queries that need to modify all these copies.

Nonetheless, K9db can support such a policy, as below.

CREATE TABLE shares (
  ID INT,
  creator INT,
  share_with_user INT,
  share_with_group INT,
  file_id INT,
  PRIMARY KEY (ID),
  FOREIGN KEY (creator) OWNED_BY users(ID),
  FOREIGN KEY (share_with_user) OWNED_BY users(ID),
  FOREIGN KEY (share_with_group) OWNED_BY usergroups(ID),
  FOREIGN KEY (file_id) OWNS files(ID)
);