From bf250097eeafd78401e891ba458ad84dd7068575 Mon Sep 17 00:00:00 2001 From: Brent Moran Date: Fri, 13 Oct 2023 23:46:51 +0800 Subject: [PATCH 01/24] add architecture pages --- docs/engineering/architecture/index.md | 96 ++++++++++ docs/engineering/architecture/models.md | 242 ++++++++++++++++++++++++ 2 files changed, 338 insertions(+) create mode 100644 docs/engineering/architecture/index.md create mode 100644 docs/engineering/architecture/models.md diff --git a/docs/engineering/architecture/index.md b/docs/engineering/architecture/index.md new file mode 100644 index 000000000..3abdfb9a2 --- /dev/null +++ b/docs/engineering/architecture/index.md @@ -0,0 +1,96 @@ +# Proposed Backend Architecture + +The main pieces of the back end code base are described here, and more depth is available in specific pages. + +## Initial Definitions + +We need a couple of terms to get started: + +- **web service**: the Django service +- **Django DB**: the DB (postgres or sqlite3) where Django keeps model data +- **User DB**: a DB owned by a Mathesar user, with their tables on it. This is a DB shown in the Mathesar UI. + +## Bird's Eye View + +The big goal of this architectural redesign is that we want to move away from using Django models to represent DB objects, and instead use functions to act on those DB objects directly. Insofar as we need to enrich the data about DB objects with other metadata, we'll do that after gathering the relevant info from the User DB. + +## Example: Getting the table info in a schema + +To get the table info in a schema, we call the endpoint `GET /api/db/v0/tables/`. We use a query string parameter to indicate we want to filter results from that endpoint based on a schema (identified by a Django-assigned integer id). Internally, then the following happens: + +1. Django builds a query that gets some `Table` model instances from the Django DB, filtered based on the desired schema, and runs it. This gets the following info for each table: + - `created_at` -- The date of createion of the table model instance (not the actual table) + - `updated_at` -- The date of last modification of the table model instance (not the actual table) + - `import_verified` -- Whether the import process was verified by the user for this table + - `is_temp` -- Whether this table is supposed to be copied into a preexisting table, then deleted + - `import_target_id` -- A preexisting table which should receive this table's data + - `schema_id` -- The Django id of the schema containing the table + - `data_files` -- A list of any data files imported to the table + - `settings` -- Some metadata describing how the table is displayed +1. We then gather the following info _for each table_ by querying the user DB. These queries are initiated by `@property` annotations in the Django models. + - `name` + - `description` -- The comment (description) of the table, defined in the User DB. + - `has_depenents` (many requests for this, actually) + - `columns` -- These are found by following a foreign key link on the Django DB. Each column model instance then runs a bunch of queries to gather relevant info from the user DB and Django DB. +1. For each column of each table, we gather the following info by querying the Django DB (can be batched): + - `display_options` -- These describe table-wide metadata about how to show the table in the UI. +1. For each column of each table, we gather the following info by querying the user DB, again with queries initiated by `@property` annotations on the Django `Column` model. For these, we end up making separate requests: + - `name` (multiple requests to user DB for each column) + - `type` + - `type_options` -- e.g., the precision specified for a `numeric` column + +All of this gets joined together, then sent back as a response from the API. + +With the new architecture, we'll instead: + +1. Call a PL/pgSQL function installed on the user database called `get_schema_table_details` It will run on the user database to gather + - `name` + - `description` + - `has_dependents` + - `columns` + - `name` + - `type` + - `type_options` + - `preview_settings` -- describes how we should show each table's rows when it's linked to by a foreign key +1. Using the returned info, filter a `TableMetadata` model based on returned `oid`s, and gather + - `import_verified` + - `is_temp` + - `import_target_id` + - `schema_id` + - `data_files` + - `column_order` -- describes the order in which columns should be displayed +1. Using the same returned info, filter a `ColumnMetadata` model based on returned `oid, attnum` pairs to gather (for each column) + - `display_options` + +We then join all of this together, and return it as a response from the API. + +The fundamental difference is that in the current version, we use foreign keys between Django models to find tables for the schema, then columns for each table. Then all queries on the User DB are initiated by functions on these model instances. In the new version, we instead run a query on the User DB to gather all relevant table and column info available on that DB, then enrich that data with metadata stored in non-foreign-key-linked metadata models in the Django DB. + +## Introduction to relevant layers + +The layers introduced here will be discussed in more detail in other sections. + +### User database + +For each API call, there should be an identifiable DB function that performs all User Database operations needed to satisfy that call. For example, + +- `GET /api/db/v0/tables/` should result in a call to some function `get_tables(sch_oid oid)` on the database. + +To achieve this, we will install the following on the user database(s) + +- Some custom Mathesar types +- Some tables holding metadata that needs to be kept near the tables +- A set of functions that provide the bulk of Mathesar's back end logic and functionality. + +### Python `db` library + +This library should mostly serve to provide thin wrapper functions around the User DB layer functions. These functions should take parameters from requests (never request objects themselves) and engines, and then call underlying DB functions. They should then pass the results up towards the API + +### Web service + +This service should provide an API and some views for use by the front end. When an API endpoint is called (or view is loaded), the service should: + +- Grab an appropriate engine from the `DatabaseConnection` model +- Call the relevant `db` library function (should be only one in most cases) +- Gathers data from the service database via models if needed (this is for metadata that's inappropriate for storage in the User DB for some reason) +- Returns it to the API diff --git a/docs/engineering/architecture/models.md b/docs/engineering/architecture/models.md new file mode 100644 index 000000000..e08e7f83d --- /dev/null +++ b/docs/engineering/architecture/models.md @@ -0,0 +1,242 @@ +# Models + +!!! warning "Warning" + This document is still a WIP. Mostly still in 'ad-hoc notes' format + +## Models (skipping unused batteries for now) + +### Column + +| Column | Type | +|------------------|--------------------------| +| id | integer | +| created\_at | timestamp with time zone | +| updated\_at | timestamp with time zone | +| attnum | integer | +| display\_options | jsonb | +| table\_id | integer | + +The only actual info here is the display options for a given column, stored as a JSON blob. Rename to `ColumnMetadata`, restructure to validate display options, delete fkey fields. Consider moving to `ma_catalog` table on user DB. + +We need to handle updating the table preview template when a new column is added (or rethink the implementation of this functionality) + +We need to replace functionality to get `ui_type` from DB type. + +To replace the dependent-getting functionality, we need to move the dependents module to SQL. + + +### Constraint + +| Column | Type | +|-------------|--------------------------| +| id | integer | +| created\_at | timestamp with time zone | +| updated\_at | timestamp with time zone | +| oid | integer | +| table\_id | integer | + +Nothing actually stored here. Delete this model. All functionality can be contained in User DB functions. + +### Database + +| Column | Type | +|-------------|--------------------------| +| id | integer | +| created\_at | timestamp with time zone | +| updated\_at | timestamp with time zone | +| name | character varying(128) | +| deleted | boolean | +| db\_name | character varying(128) | +| editable | boolean | +| host | character varying(255) | +| password | text | +| port | integer | +| username | text | + +Stores connection info to allow accessing a DB by creating an SQLAlchemy engine. + +Referenced by DatabaseRole and Schema models. + +Replace this with a `DatabaseConnection` model containing a DB hostname, port, database, and username. We should add functionality to store some details in a [`.pgpass`](https://www.postgresql.org/docs/current/libpq-pgpass.html) dotfile (though probably in a custom location). `psycopg` can inject the password automatically through these means. + +The Django permissions infrastructure should handle whether a user can access the DB at all in the Django layer (i.e., whether they can use the connection string), and other permissions on the DB should be handled by restricting the relevant user (referred to by the conn string) on the DB. For this to be sensible, we need a UI concept of a database user (beyond just connecting your Mathesar user to a database) This could be something like `mathesar@mathesar_tables` (in our default setup). _If_ we don't think that's feasible from a UX perspective, we should set up a database user for each Mathesar user, on each database, with minimal permissions (`CONNECT` and `CREATE`). + +### DatabaseRole + +| Column | Type | +|--------------|--------------------------| +| id | integer | +| created\_at | timestamp with time zone | +| updated\_at | timestamp with time zone | +| role | character varying(10) | +| database\_id | integer | +| user\_id | integer | + +This stores a role on a given database for a given user. We should delete this model, and instead have a many-to-many mapping between Mathesar users and database connections. All actual access should be handled by underlying DB permissions for the underlying user. + +### DataFile + +| Column | Type | +|-------------------------|--------------------------| +| id | integer | +| created\_at | timestamp with time zone | +| updated\_at | timestamp with time zone | +| file | character varying(100) | +| created\_from | character varying(128) | +| base\_name | character varying(100) | +| header | boolean | +| delimiter | character varying(1) | +| escapechar | character varying(1) | +| quotechar | character varying(1) | +| table\_imported\_to\_id | integer | +| user\_id | integer | +| type | character varying(128) | +| max\_level | integer | +| sheet\_index | integer | + +This stores metadata about files which have been uploaded for import into Mathesar. We should keep this model. `table_imported_to_id` should be removed (it's not used anywhere is it?). Also `max_level` seems like less of a data file attribute and more of an import setting. + +Request from Pavish: We should reduce the number of requests in the import process. Right now it's many. Consider an RPC endpoint. Brent agrees this is a good idea. + +### PreviewColumnSettings + +| Column | Type | +|-------------|--------------------------| +| id | integer | +| created\_at | timestamp with time zone | +| updated\_at | timestamp with time zone | +| customized | boolean | +| template | character varying(255) | + +This stores the template defining what should be shown in a referencing fkey column for this table. This would be _much_ better as a `ma_catalog` table for efficiency reasons. + +In that case, a table's preview settings would be "global", i.e., it would be attached to the table rather than a user, table pair. + +Referenced by TableSettings. + +### Schema + +| Column | Type | +|--------------|--------------------------| +| id | integer | +| created\_at | timestamp with time zone | +| updated\_at | timestamp with time zone | +| oid | integer | +| database\_id | integer | + +Nothing stored here. + +Referenced by SchemaRole and Table models. + +Delete this model. All permissions handled by the referencing SchemaRole should instead be handled by the underlying user's permissions on the actual schema in the DB + +### SchemaRole + +| Column | Type | +|------------|--------------------------| +| id | integer | +| created_at | timestamp with time zone | +| updated_at | timestamp with time zone | +| role | character varying(10) | +| schema_id | integer | +| user_id | integer | + +This should be deleted, and the permissions should be instead managed on the underlying DB. + +### SharedQuery + +| Column | Type | +|------------|--------------------------| +| id | integer | +| created_at | timestamp with time zone | +| updated_at | timestamp with time zone | +| slug | uuid | +| enabled | boolean | +| query_id | integer | + +This model should stay. No changes here. We need to add metadata about which user created the share, so we can use that user's allowed DB credentials for accessing the underlying assets (tables, schemata, etc.) + +### SharedTable + +| Column | Type | +|------------|--------------------------| +| id | integer | +| created_at | timestamp with time zone | +| updated_at | timestamp with time zone | +| slug | uuid | +| enabled | boolean | +| table_id | integer | + +Only change is that we need to refer directly to a table OID, and handle permissions. + +### Table + +| Column | Type | +|------------------|--------------------------| +| id | integer | +| created_at | timestamp with time zone | +| updated_at | timestamp with time zone | +| oid | integer | +| import_verified | boolean | +| is_temp | boolean | +| import_target_id | integer | +| schema_id | integer | + +Stores info about: + +- whether the initial data import for the table has been manually verified by a user or not, and +- whether the table is actually a temporary holder for data intended for a preexisting table. + +Referenced by Column, Constraint, DataFile, SharedTable, Table, TableSettings, and UIQuery models + +We should combine this with the `TableSettings` model to create a `TableMetadata` model that just has that info, and drop all fkeys and references. + +### TableSettings + +| Column | Type | +|---------------------|--------------------------| +| id | integer | +| created_at | timestamp with time zone | +| updated_at | timestamp with time zone | +| column_order | jsonb | +| preview_settings_id | integer | +| table_id | integer | + +This stores Mathesar-specific metadata about tables. Should be combined with remains of `Table` model. + +### UIQuery + +| Column | Type | +| ----------------|--------------------------| +| id | integer | +| created_at | timestamp with time zone | +| updated_at | timestamp with time zone | +| name | character varying(128) | +| description | text | +| initial_columns | jsonb | +| transformations | jsonb | +| display_options | jsonb | +| display_names | jsonb | +| base_table_id | integer | + +This stores a definition of a stored query that can be run on command. The main changes are that it should refer directly to DB-layer ids (oids and attnums) rather than Django-layer. + +### User + +| Column | Type | +| -----------------------|--------------------------| +| id | integer | +| password | character varying(128) | +| last_login | timestamp with time zone | +| is_superuser | boolean | +| username | character varying(150) | +| email | character varying(254) | +| is_staff | boolean | +| is_active | boolean | +| date_joined | timestamp with time zone | +| full_name | character varying(255) | +| short_name | character varying(255) | +| password_change_needed | boolean | + +This stores user metadata. I think we should mostly keep it as is, + From e57e1592afc78eef37790e7a1267ae8b37777a4d Mon Sep 17 00:00:00 2001 From: Brent Moran Date: Tue, 17 Oct 2023 17:09:59 +0800 Subject: [PATCH 02/24] Add permissions architecture page --- docs/engineering/architecture/index.md | 22 ++++++++++ docs/engineering/architecture/permissions.md | 45 ++++++++++++++++++++ 2 files changed, 67 insertions(+) create mode 100644 docs/engineering/architecture/permissions.md diff --git a/docs/engineering/architecture/index.md b/docs/engineering/architecture/index.md index 3abdfb9a2..ccdaba01b 100644 --- a/docs/engineering/architecture/index.md +++ b/docs/engineering/architecture/index.md @@ -94,3 +94,25 @@ This service should provide an API and some views for use by the front end. When - Call the relevant `db` library function (should be only one in most cases) - Gathers data from the service database via models if needed (this is for metadata that's inappropriate for storage in the User DB for some reason) - Returns it to the API + +## Permissions and users + +We should, whenever possible, derive the access to any Django model based on access to the underlying DB object in real time. Details are [here](./permissions.md). + +### Example + +A user lists the columns for a table. Because they have access to read the columns of the table, they can read (their) display options for a table. If they have access to modify a column of a table, they have access to modify the relevant display options. This works as long as their isn't a dedicated `display_options` endpoint which could receive requests directly. + +### Exceptions + +There are some metadata and other models that we'll be keeping which _can_ receive direct requests. Currently, these are: + +- Database connections +- Shareable links +- Explorations + +Access to these will be managed using the Django permissions framework (i.e., with access policies). + +### We should avoid + +We should absolutely avoid caching or storing permissions and access to Django models based on permissions for underlying DB models. For example, if we ever want a user to manage display options directly, they should own _any_ display option they've created (and have access to it) regardless of their access (or lack of it) to the underlying DB object. diff --git a/docs/engineering/architecture/permissions.md b/docs/engineering/architecture/permissions.md new file mode 100644 index 000000000..a9a8e3031 --- /dev/null +++ b/docs/engineering/architecture/permissions.md @@ -0,0 +1,45 @@ +# Users and Permissions + +The big picture is that we will implement access management for DB objects (including databases themselves) by giving admin users of Mathesar the ability to use database-layer permissions management tools (e.g., `GRANT`). For web service resources (e.g., Django model instances), permissions will be managed in Django. + +## Database Connection Model + +As described in [the models section](./models.md), This will store the information needed to create an engine and connect to a database. Because the information includes a role (to define the initial connection role), it necessarily defines a set of privileges available on the database with that connection. + +## Adding Connections (backend perspective) + +Regardless of UI, the backend should receive a `POST` request defining a new connection. It should store this in the `DatabaseConnection` model, and initially no users should have access to that. Then, the backend should receive `PATCH` requests to modify various users, giving them access to use that connection. Note that this does _not_ directly modify anything to do with permissions on actual database objects (e.g., schemata or tables). I don't think our "Manager", "Editor", "Viewer" framework makes sense here. You have two levels: "Admin" (this should be a proper Mathesar superuser), and "Connecter" (a user who's allowed to use a given connection). Thus, we don't actually need most power of the access policy framework. A simple many-to-many map between connections and users suffices to control usage privileges, and the ability to modify that mapping (or indeed to look at the mapping) should be locked behind a superuser login. + +## Granting database object privileges + +The backend will provide an endpoint that lets an admin (who has access to a sufficiently-privileged role via a connection) access database-level permission granting functionality directly. So, to grant access to create tables in a schema to a Mathesar user, the admin needs to send a `PATCH` request to CRUD endpoint that defines a privilege-granting query (via a function) on the database. Note that this `PATCH` request doesn't actually modify any model instance. We should probably namespace this endpoint differently than the Mathesar user admin for clarity. + +- Privileges on DB objects are thus not granted directly to Mathesar users. +- Privileges on DB objects are granted to DB roles, which may be accessible to some (or all) Mathesar users. + +!!! question "UX Question" + Should access to Connections and access to DB objects via connections be presented the same way in the UI? + +!!! question "UX Question" + Should the admin think in terms of groups of Mathesar users, or specifically in terms of connections when dealing with DB-level privileges? + +!!! danger "Potential Confusion" + In case the admin or DBA want multiple Mathesar users to be able to modify various DB objects, three options are available: + - Give all relevant Mathesar users access to connect as the owning DB user. + - Create a DB role owning the desired tables, and `GRANT` that role to DB users connectible by the relevant Mathesar users. + - Have at least one DB superuser available for use with a connection, and give relevant Mathesar users access to that DB superuser. + +## Single-user metadata object privileges + +There are some metadata objects for which privileges shouldn't really be granted or revoked. An example is the display options for a column. For these types of objects, we should simply keep an 'owning user' attribute of each object instance, and use that to apply the correct version of the metadata when returning object info from the database. + +## Multi-user metadata object privileges + +Examples of these are table properties like preview columns. We've decided these should be per-table, and so this metadata will be stored on the user DB in a `msar_cat` namespace. Thus, modifying this metadata will occur in a request to modify the associated table, and the privilege check will be whether the relevant DB user is allowed to `ALTER` that table. + +## Standalone Mathesar objects + +An example of this would be an Exploration. The privileges needed to actually _run_ the query are handled on the database as described above, but a user should have access to look at the exploration definition itself (as well as edit/delete) it handled by the Django access policy framework. It seems like our current Manager/Editor/Viewer framework should suffice, but needs to be applied directly to such objects. + +!!! question "UX Question" + How should viewing others' objects work for these? Should they be namespaced under some "workspace" concept (currently we're using database/schema)? From 99d5e2de908e3a8446fee235a4446a1ccc991973 Mon Sep 17 00:00:00 2001 From: Brent Moran Date: Wed, 18 Oct 2023 16:06:49 +0800 Subject: [PATCH 03/24] Update with hierarchichal permissions notes --- docs/engineering/architecture/permissions.md | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/docs/engineering/architecture/permissions.md b/docs/engineering/architecture/permissions.md index a9a8e3031..f769d6195 100644 --- a/docs/engineering/architecture/permissions.md +++ b/docs/engineering/architecture/permissions.md @@ -8,7 +8,7 @@ As described in [the models section](./models.md), This will store the informati ## Adding Connections (backend perspective) -Regardless of UI, the backend should receive a `POST` request defining a new connection. It should store this in the `DatabaseConnection` model, and initially no users should have access to that. Then, the backend should receive `PATCH` requests to modify various users, giving them access to use that connection. Note that this does _not_ directly modify anything to do with permissions on actual database objects (e.g., schemata or tables). I don't think our "Manager", "Editor", "Viewer" framework makes sense here. You have two levels: "Admin" (this should be a proper Mathesar superuser), and "Connecter" (a user who's allowed to use a given connection). Thus, we don't actually need most power of the access policy framework. A simple many-to-many map between connections and users suffices to control usage privileges, and the ability to modify that mapping (or indeed to look at the mapping) should be locked behind a superuser login. +Regardless of UI, the backend should receive a `POST` request defining a new connection. It should store this in the `DatabaseConnection` model, and initially no users should have access to that. Then, the backend should receive `PATCH` requests to modify various users, giving them access to use that connection. Note that this does _not_ directly modify anything to do with permissions on actual database objects (e.g., schemata or tables). I don't think our "Manager", "Editor", "Viewer" framework makes sense here. You have two levels: "Admin" (a proper Mathesar superuser), and "Connecter" (a user who's allowed to use a given connection). Thus, we don't actually need most power of the access policy framework. A simple many-to-many map between connections and users suffices to control usage privileges, and the ability to modify that mapping (or indeed to look at the mapping) should be locked behind a superuser login. ## Granting database object privileges @@ -43,3 +43,15 @@ An example of this would be an Exploration. The privileges needed to actually _r !!! question "UX Question" How should viewing others' objects work for these? Should they be namespaced under some "workspace" concept (currently we're using database/schema)? + +## Note on hierarchical permissions + +We can allow an "admin" DB user by automatically granting the role associated with each Mathesar-managed connection to to a given admin DB user. So, if we have a user `mathesaradmin`, and a regular user `joe`, we can run `GRANT joe TO mathesaradmin` (as a role with sufficient privileges; At least `ADMIN` is needed on `joe` to run this). This would let `mathesaradmin` act as a Manager on anything created by `joe`. + +Our `Manager` concept implies (co-)ownership of all managed sub-objects. I.e., a Database Manager owns all objects in that database (using the description in our docs) + +Our `Editor` concept implies `SELECT`, `INSERT`, `UPDATE`, `DELETE`, `TRUNCATE`, `REFERENCES` (sort of; with the way Mathesar currently treats fkeys) + +Our `Viewer` concept implies `SELECT` on objects (obviously) + +Thus, _if_ we want to recreate our current conceptual framework, it's possible (and not too difficult). From 9a87de0238b3095aa62abe17b1386c58cde63683 Mon Sep 17 00:00:00 2001 From: Brent Moran Date: Wed, 18 Oct 2023 22:57:36 +0800 Subject: [PATCH 04/24] Add more UX questions about permissions --- docs/engineering/architecture/permissions.md | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/docs/engineering/architecture/permissions.md b/docs/engineering/architecture/permissions.md index f769d6195..98b395e52 100644 --- a/docs/engineering/architecture/permissions.md +++ b/docs/engineering/architecture/permissions.md @@ -8,7 +8,10 @@ As described in [the models section](./models.md), This will store the informati ## Adding Connections (backend perspective) -Regardless of UI, the backend should receive a `POST` request defining a new connection. It should store this in the `DatabaseConnection` model, and initially no users should have access to that. Then, the backend should receive `PATCH` requests to modify various users, giving them access to use that connection. Note that this does _not_ directly modify anything to do with permissions on actual database objects (e.g., schemata or tables). I don't think our "Manager", "Editor", "Viewer" framework makes sense here. You have two levels: "Admin" (a proper Mathesar superuser), and "Connecter" (a user who's allowed to use a given connection). Thus, we don't actually need most power of the access policy framework. A simple many-to-many map between connections and users suffices to control usage privileges, and the ability to modify that mapping (or indeed to look at the mapping) should be locked behind a superuser login. +Regardless of UI, the backend should receive a `POST` request defining a new connection. It should store this in the `DatabaseConnection` model, and initially no users should have access to that. Then, the backend should receive `PATCH` requests to modify various users, giving them access to use that connection. Note that this does _not_ directly modify anything to do with permissions on actual database objects (e.g., schemata or tables). I don't think our "Manager", "Editor", "Viewer" framework makes sense here. You have two levels: "Admin" (a proper Mathesar superuser), and "User" (a user who's allowed to use a given connection). + +!!! question "UX Question" + Is there a use-case for having a non-admin "Manager" role who can grant access to a given connection string, but not others? ## Granting database object privileges @@ -26,16 +29,16 @@ The backend will provide an endpoint that lets an admin (who has access to a suf !!! danger "Potential Confusion" In case the admin or DBA want multiple Mathesar users to be able to modify various DB objects, three options are available: - Give all relevant Mathesar users access to connect as the owning DB user. - - Create a DB role owning the desired tables, and `GRANT` that role to DB users connectible by the relevant Mathesar users. + - `GRANT` the owning DB role (ostensibly a user) to DB users connectable by the relevant Mathesar users. - Have at least one DB superuser available for use with a connection, and give relevant Mathesar users access to that DB superuser. ## Single-user metadata object privileges -There are some metadata objects for which privileges shouldn't really be granted or revoked. An example is the display options for a column. For these types of objects, we should simply keep an 'owning user' attribute of each object instance, and use that to apply the correct version of the metadata when returning object info from the database. +There are some metadata objects for which privileges shouldn't really be granted or revoked. An example is the display options for a column. For these types of objects, we should simply keep an 'owning user' attribute of each model instance, and use that to apply the correct version of the metadata when returning object info from the database. Note that if we end up exposing an endpoint to work on these metadata models directly, then we'll need access policies. ## Multi-user metadata object privileges -Examples of these are table properties like preview columns. We've decided these should be per-table, and so this metadata will be stored on the user DB in a `msar_cat` namespace. Thus, modifying this metadata will occur in a request to modify the associated table, and the privilege check will be whether the relevant DB user is allowed to `ALTER` that table. +Examples of these are table properties like preview columns. We've decided these should be per-table, and so this metadata will be stored on the user DB in a `msar_cat` namespace. Thus, modifying this metadata will occur in a request to modify the associated table, and the privilege check will be whether the relevant DB user is allowed to `ALTER` that table. Viewing this metadata would check whether the relevant DB user is allowed to `SELECT` that table. ## Standalone Mathesar objects @@ -44,6 +47,13 @@ An example of this would be an Exploration. The privileges needed to actually _r !!! question "UX Question" How should viewing others' objects work for these? Should they be namespaced under some "workspace" concept (currently we're using database/schema)? +## Shared links + +A shared table or exploration needs access to a `DatabaseConnection` to run (or be viewed). We should bypass any permission checks when actually retrieving data. The shared object model should include an attribute giving the `DatabaseConnection` instance needed to run (or view). + +!!! danger "Tech/Product concern" + The safest (and easiest) implementation would be to have specialized view-only users who are `GRANT`ed `SELECT` on relevant DB objects when needed. Then, sharable links would _only_ use those users. But, we'll need to justify this choice in documentation somewhere. + ## Note on hierarchical permissions We can allow an "admin" DB user by automatically granting the role associated with each Mathesar-managed connection to to a given admin DB user. So, if we have a user `mathesaradmin`, and a regular user `joe`, we can run `GRANT joe TO mathesaradmin` (as a role with sufficient privileges; At least `ADMIN` is needed on `joe` to run this). This would let `mathesaradmin` act as a Manager on anything created by `joe`. From c327644416a5d18dbf579a2adf1e769ad76b73a2 Mon Sep 17 00:00:00 2001 From: Brent Moran Date: Fri, 16 Feb 2024 07:36:27 +0000 Subject: [PATCH 05/24] Update and clean up base architecture page --- docs/engineering/architecture/index.md | 70 ++++++++++++++++++-------- 1 file changed, 48 insertions(+), 22 deletions(-) diff --git a/docs/engineering/architecture/index.md b/docs/engineering/architecture/index.md index ccdaba01b..b58ffe205 100644 --- a/docs/engineering/architecture/index.md +++ b/docs/engineering/architecture/index.md @@ -1,4 +1,4 @@ -# Proposed Backend Architecture +# Proposed Back End Architecture The main pieces of the back end code base are described here, and more depth is available in specific pages. @@ -7,19 +7,25 @@ The main pieces of the back end code base are described here, and more depth is We need a couple of terms to get started: - **web service**: the Django service -- **Django DB**: the DB (postgres or sqlite3) where Django keeps model data +- **Django DB**: the DB (PostgreSQL or sqlite3) where Django keeps model data - **User DB**: a DB owned by a Mathesar user, with their tables on it. This is a DB shown in the Mathesar UI. ## Bird's Eye View -The big goal of this architectural redesign is that we want to move away from using Django models to represent DB objects, and instead use functions to act on those DB objects directly. Insofar as we need to enrich the data about DB objects with other metadata, we'll do that after gathering the relevant info from the User DB. +The main goals of this architectural redesign are to improve the speed and reduce the complexity of the back end. Secondary goals are to improve the convenience of the API for the use case of our front end, and make it easier for users and contributors to identify which back end code supports a given front end feature. We plan to do this by: -## Example: Getting the table info in a schema +- Removing Django models representing User DB objects (e.g., tables), replacing them with functions that query and act on those DB objects directly. Insofar as we need to enrich the metadata about User DB objects with Mathesar-managed metadata, we'll do that after gathering the relevant info from the User DB. +- Changing our API to use a JSON-RPC (2.0) spec. -To get the table info in a schema, we call the endpoint `GET /api/db/v0/tables/`. We use a query string parameter to indicate we want to filter results from that endpoint based on a schema (identified by a Django-assigned integer id). Internally, then the following happens: +## A Motivating Example: Getting the table info in a schema -1. Django builds a query that gets some `Table` model instances from the Django DB, filtered based on the desired schema, and runs it. This gets the following info for each table: - - `created_at` -- The date of createion of the table model instance (not the actual table) +To get the table info in a schema, + +1. The front end calls the endpoint `GET /api/db/v0/tables/` using a query string parameter to filter results from that endpoint based on a schema (identified by a Django-assigned integer id). Internally, then the following happens: + +1. The web service builds a query that gets some `Table` model instances from the Django DB, filtered based on the desired schema, as well as filtered according to applicable access control policies, and runs it. This gets the following info for each table: + + - `created_at` -- The date of creation of the table model instance (not the actual table) - `updated_at` -- The date of last modification of the table model instance (not the actual table) - `import_verified` -- Whether the import process was verified by the user for this table - `is_temp` -- Whether this table is supposed to be copied into a preexisting table, then deleted @@ -27,23 +33,36 @@ To get the table info in a schema, we call the endpoint `GET /api/db/v0/tables/` - `schema_id` -- The Django id of the schema containing the table - `data_files` -- A list of any data files imported to the table - `settings` -- Some metadata describing how the table is displayed -1. We then gather the following info _for each table_ by querying the user DB. These queries are initiated by `@property` annotations in the Django models. + +1. The web service determines which connection to use with the User DB by querying for which `Database` model (called connections in the API) the requested schema lives under, and asking that model to give it a connection string. + +1. The web service then gathers the following info _for each table_ by querying the User DB. These queries are initiated by `@property` annotations in the Django models. + - `name` - `description` -- The comment (description) of the table, defined in the User DB. - `has_depenents` (many requests for this, actually) - `columns` -- These are found by following a foreign key link on the Django DB. Each column model instance then runs a bunch of queries to gather relevant info from the user DB and Django DB. -1. For each column of each table, we gather the following info by querying the Django DB (can be batched): + +1. _For each column of each table_, the web service then gathers the following info by querying the Django DB (can be batched): + - `display_options` -- These describe table-wide metadata about how to show the table in the UI. -1. For each column of each table, we gather the following info by querying the user DB, again with queries initiated by `@property` annotations on the Django `Column` model. For these, we end up making separate requests: + +1. _For each column of each table_, we gather the following info by querying the user DB, again with queries initiated by `@property` annotations on the Django `Column` model. For these, we end up making separate requests: + - `name` (multiple requests to user DB for each column) - `type` - `type_options` -- e.g., the precision specified for a `numeric` column All of this gets joined together, then sent back as a response from the API. -With the new architecture, we'll instead: +With the new architecture, to get the same info, + +1. The front end calls an RPC endpoint `/api/v0/rpc/`, calling a function `get_schema_table_details` with `database` and `schema` parameters. The database is identified by a Django id referring to a database model, and the schema is identified by an OID. + +1. The web service uses the `user` and `database` (the user is picked up from the request object) to acquire a connection string. + +1. Using that connection, the web service calls a PL/pgSQL function installed on the User DB called `get_schema_table_details` to gather -1. Call a PL/pgSQL function installed on the user database called `get_schema_table_details` It will run on the user database to gather - `name` - `description` - `has_dependents` @@ -52,14 +71,17 @@ With the new architecture, we'll instead: - `type` - `type_options` - `preview_settings` -- describes how we should show each table's rows when it's linked to by a foreign key -1. Using the returned info, filter a `TableMetadata` model based on returned `oid`s, and gather + +1. Using the returned info, the web service filters a `TableMetadata` model based on returned `oid`s, and gathers + - `import_verified` - `is_temp` - `import_target_id` - - `schema_id` - `data_files` - `column_order` -- describes the order in which columns should be displayed + 1. Using the same returned info, filter a `ColumnMetadata` model based on returned `oid, attnum` pairs to gather (for each column) + - `display_options` We then join all of this together, and return it as a response from the API. @@ -74,12 +96,12 @@ The layers introduced here will be discussed in more detail in other sections. For each API call, there should be an identifiable DB function that performs all User Database operations needed to satisfy that call. For example, -- `GET /api/db/v0/tables/` should result in a call to some function `get_tables(sch_oid oid)` on the database. +- Calling the function `get_tables(database, schema)` should result in a call to some function `get_tables(sch_oid oid)` on the `database`. To achieve this, we will install the following on the user database(s) -- Some custom Mathesar types -- Some tables holding metadata that needs to be kept near the tables +- Some custom Mathesar types. +- Some tables holding metadata that needs to be kept near the tables. - A set of functions that provide the bulk of Mathesar's back end logic and functionality. ### Python `db` library @@ -88,20 +110,24 @@ This library should mostly serve to provide thin wrapper functions around the Us ### Web service -This service should provide an API and some views for use by the front end. When an API endpoint is called (or view is loaded), the service should: +This service should provide an JSON-RPC API for use by the front end. When an API function is called, the service should: -- Grab an appropriate engine from the `DatabaseConnection` model -- Call the relevant `db` library function (should be only one in most cases) +- Grab an appropriate engine from the `DatabaseConnection` model by combining the user associated with the request with the `database`. +- Call the relevant `db` library function (should be only one in most cases). - Gathers data from the service database via models if needed (this is for metadata that's inappropriate for storage in the User DB for some reason) - Returns it to the API ## Permissions and users -We should, whenever possible, derive the access to any Django model based on access to the underlying DB object in real time. Details are [here](./permissions.md). +- All permission checks for accessing a User DB object (e.g., table) should happen on the User DB. We should not add another layer of checks for these objects in the web service. +- Permission checks for accessing and managing info in the Django DB (e.g., Exploration definitions) are handled in the web service. +- We should, whenever possible, permissions on Django models based on access to the underlying DB object in real time. Details are [here](./permissions.md). ### Example -A user lists the columns for a table. Because they have access to read the columns of the table, they can read (their) display options for a table. If they have access to modify a column of a table, they have access to modify the relevant display options. This works as long as their isn't a dedicated `display_options` endpoint which could receive requests directly. +**TODO** Clean this up and check it. + +A user lists the columns for a table. Because they have access to read the columns of the table (checked on the DB), they can read (their) display options for a table. If they have access to modify a column of a table, they have access to modify the relevant display options. This works as long as there isn't a dedicated `display_options` endpoint which could receive requests directly. ### Exceptions From 3036a270842227017b0446e2ea147340602f2232 Mon Sep 17 00:00:00 2001 From: Brent Moran Date: Mon, 19 Feb 2024 06:20:35 -0600 Subject: [PATCH 06/24] Update models page with new setup for DB connections --- docs/engineering/architecture/models.md | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/docs/engineering/architecture/models.md b/docs/engineering/architecture/models.md index e08e7f83d..a9d89204f 100644 --- a/docs/engineering/architecture/models.md +++ b/docs/engineering/architecture/models.md @@ -57,9 +57,14 @@ Stores connection info to allow accessing a DB by creating an SQLAlchemy engine. Referenced by DatabaseRole and Schema models. -Replace this with a `DatabaseConnection` model containing a DB hostname, port, database, and username. We should add functionality to store some details in a [`.pgpass`](https://www.postgresql.org/docs/current/libpq-pgpass.html) dotfile (though probably in a custom location). `psycopg` can inject the password automatically through these means. +Replace this with a pair of models: -The Django permissions infrastructure should handle whether a user can access the DB at all in the Django layer (i.e., whether they can use the connection string), and other permissions on the DB should be handled by restricting the relevant user (referred to by the conn string) on the DB. For this to be sensible, we need a UI concept of a database user (beyond just connecting your Mathesar user to a database) This could be something like `mathesar@mathesar_tables` (in our default setup). _If_ we don't think that's feasible from a UX perspective, we should set up a database user for each Mathesar user, on each database, with minimal permissions (`CONNECT` and `CREATE`). +- a `DatabaseServerCredential` model containing a DB hostname, port, and username; and +- a `UserDatabaseMap` model containing a `User` fkey, a `DatabaseServerCredential` fkey, and a `DatabaseServerCredential` fkey. The model should additionally have a `database` field, and the `user_id`, `database` pair should be unique. + +We should eventually add functionality to store some details in a [`.pgpass`](https://www.postgresql.org/docs/current/libpq-pgpass.html) dotfile (though probably in a custom location). `psycopg` can inject the password automatically through these means. + +The Django permissions infrastructure should handle CRUD operations on `DatabaseServerCredential` and `UserDatabaseMap` resources. Actually accessing a database wouldn't require the permissions infrastructure; we'd instead construct a connection string by joining the appropriate `database` to the other info found by looking up the `user_id, database` pair. ### DatabaseRole @@ -96,8 +101,6 @@ This stores a role on a given database for a given user. We should delete this m This stores metadata about files which have been uploaded for import into Mathesar. We should keep this model. `table_imported_to_id` should be removed (it's not used anywhere is it?). Also `max_level` seems like less of a data file attribute and more of an import setting. -Request from Pavish: We should reduce the number of requests in the import process. Right now it's many. Consider an RPC endpoint. Brent agrees this is a good idea. - ### PreviewColumnSettings | Column | Type | @@ -238,5 +241,4 @@ This stores a definition of a stored query that can be run on command. The main | short_name | character varying(255) | | password_change_needed | boolean | -This stores user metadata. I think we should mostly keep it as is, - +This stores user metadata. I think we should mostly keep it as is. It will be referenced by the `DatabaseServerCredential` modeel. From ca65faaa44dfc6525b84a71f26282c6a03e91ef0 Mon Sep 17 00:00:00 2001 From: Brent Moran Date: Mon, 19 Feb 2024 08:46:43 -0600 Subject: [PATCH 07/24] separate Database model for extensibility --- docs/engineering/architecture/models.md | 145 ++++++++++++------------ 1 file changed, 73 insertions(+), 72 deletions(-) diff --git a/docs/engineering/architecture/models.md b/docs/engineering/architecture/models.md index a9d89204f..9b9b5c0d1 100644 --- a/docs/engineering/architecture/models.md +++ b/docs/engineering/architecture/models.md @@ -57,10 +57,11 @@ Stores connection info to allow accessing a DB by creating an SQLAlchemy engine. Referenced by DatabaseRole and Schema models. -Replace this with a pair of models: +Replace this with these models: +- a `Database` model containing (for now) only an ID, an OID, and a `database` (name). We could also add other metadata to this model as appropriate. - a `DatabaseServerCredential` model containing a DB hostname, port, and username; and -- a `UserDatabaseMap` model containing a `User` fkey, a `DatabaseServerCredential` fkey, and a `DatabaseServerCredential` fkey. The model should additionally have a `database` field, and the `user_id`, `database` pair should be unique. +- a `UserDatabaseMap` model containing a `user_id` fkey to the `User` model, a `database_id` fkey to the `Database` model, and a `database_server_credential_id` fkey to the `DatabaseServerCredential` model. The `UserDatabaseMap` model should additionally have a `database` field, and the `user_id`, `database` pair should be unique. We should eventually add functionality to store some details in a [`.pgpass`](https://www.postgresql.org/docs/current/libpq-pgpass.html) dotfile (though probably in a custom location). `psycopg` can inject the password automatically through these means. @@ -135,55 +136,55 @@ Delete this model. All permissions handled by the referencing SchemaRole should ### SchemaRole -| Column | Type | -|------------|--------------------------| -| id | integer | -| created_at | timestamp with time zone | -| updated_at | timestamp with time zone | -| role | character varying(10) | -| schema_id | integer | -| user_id | integer | +| Column | Type | +|-------------|--------------------------| +| id | integer | +| created\_at | timestamp with time zone | +| updated\_at | timestamp with time zone | +| role | character varying(10) | +| schema\_id | integer | +| user\_id | integer | This should be deleted, and the permissions should be instead managed on the underlying DB. ### SharedQuery -| Column | Type | -|------------|--------------------------| -| id | integer | -| created_at | timestamp with time zone | -| updated_at | timestamp with time zone | -| slug | uuid | -| enabled | boolean | -| query_id | integer | +| Column | Type | +|-------------|--------------------------| +| id | integer | +| created\_at | timestamp with time zone | +| updated\_at | timestamp with time zone | +| slug | uuid | +| enabled | boolean | +| query\_id | integer | This model should stay. No changes here. We need to add metadata about which user created the share, so we can use that user's allowed DB credentials for accessing the underlying assets (tables, schemata, etc.) ### SharedTable -| Column | Type | -|------------|--------------------------| -| id | integer | -| created_at | timestamp with time zone | -| updated_at | timestamp with time zone | -| slug | uuid | -| enabled | boolean | -| table_id | integer | +| Column | Type | +|-------------|--------------------------| +| id | integer | +| created\_at | timestamp with time zone | +| updated\_at | timestamp with time zone | +| slug | uuid | +| enabled | boolean | +| table\_id | integer | Only change is that we need to refer directly to a table OID, and handle permissions. ### Table -| Column | Type | -|------------------|--------------------------| -| id | integer | -| created_at | timestamp with time zone | -| updated_at | timestamp with time zone | -| oid | integer | -| import_verified | boolean | -| is_temp | boolean | -| import_target_id | integer | -| schema_id | integer | +| Column | Type | +|--------------------|--------------------------| +| id | integer | +| created\_at | timestamp with time zone | +| updated\_at | timestamp with time zone | +| oid | integer | +| import\_verified | boolean | +| is\_temp | boolean | +| import\_target\_id | integer | +| schema\_id | integer | Stores info about: @@ -196,49 +197,49 @@ We should combine this with the `TableSettings` model to create a `TableMetadata ### TableSettings -| Column | Type | -|---------------------|--------------------------| -| id | integer | -| created_at | timestamp with time zone | -| updated_at | timestamp with time zone | -| column_order | jsonb | -| preview_settings_id | integer | -| table_id | integer | +| Column | Type | +|-----------------------|--------------------------| +| id | integer | +| created\_at | timestamp with time zone | +| updated\_at | timestamp with time zone | +| column\_order | jsonb | +| preview\_settings\_id | integer | +| table\_id | integer | This stores Mathesar-specific metadata about tables. Should be combined with remains of `Table` model. ### UIQuery -| Column | Type | -| ----------------|--------------------------| -| id | integer | -| created_at | timestamp with time zone | -| updated_at | timestamp with time zone | -| name | character varying(128) | -| description | text | -| initial_columns | jsonb | -| transformations | jsonb | -| display_options | jsonb | -| display_names | jsonb | -| base_table_id | integer | +| Column | Type | +|------------------|--------------------------| +| id | integer | +| created\_at | timestamp with time zone | +| updated\_at | timestamp with time zone | +| name | character varying(128) | +| description | text | +| initial\_columns | jsonb | +| transformations | jsonb | +| display\_options | jsonb | +| display\_names | jsonb | +| base\_table\_id | integer | This stores a definition of a stored query that can be run on command. The main changes are that it should refer directly to DB-layer ids (oids and attnums) rather than Django-layer. ### User -| Column | Type | -| -----------------------|--------------------------| -| id | integer | -| password | character varying(128) | -| last_login | timestamp with time zone | -| is_superuser | boolean | -| username | character varying(150) | -| email | character varying(254) | -| is_staff | boolean | -| is_active | boolean | -| date_joined | timestamp with time zone | -| full_name | character varying(255) | -| short_name | character varying(255) | -| password_change_needed | boolean | - -This stores user metadata. I think we should mostly keep it as is. It will be referenced by the `DatabaseServerCredential` modeel. +| Column | Type | +|--------------------------|--------------------------| +| id | integer | +| password | character varying(128) | +| last\_login | timestamp with time zone | +| is\_superuser | boolean | +| username | character varying(150) | +| email | character varying(254) | +| is\_staff | boolean | +| is\_active | boolean | +| date\_joined | timestamp with time zone | +| full\_name | character varying(255) | +| short\_name | character varying(255) | +| password\_change\_needed | boolean | + +This stores user metadata. I think we should mostly keep it as is. It will be referenced by the `DatabaseServerCredential` model. From 27ac725c9e7118ca1325c4887e0478a1800fa1a7 Mon Sep 17 00:00:00 2001 From: Brent Moran Date: Mon, 19 Feb 2024 09:04:15 -0600 Subject: [PATCH 08/24] Update connection wrangling descriptions on permissions page --- docs/engineering/architecture/models.md | 2 +- docs/engineering/architecture/permissions.md | 25 +++++++++++--------- 2 files changed, 15 insertions(+), 12 deletions(-) diff --git a/docs/engineering/architecture/models.md b/docs/engineering/architecture/models.md index 9b9b5c0d1..8632581ea 100644 --- a/docs/engineering/architecture/models.md +++ b/docs/engineering/architecture/models.md @@ -59,7 +59,7 @@ Referenced by DatabaseRole and Schema models. Replace this with these models: -- a `Database` model containing (for now) only an ID, an OID, and a `database` (name). We could also add other metadata to this model as appropriate. +- a `Database` model containing (for now) only an ID, an OID, and a `database` (name). We could also add other metadata to this model as appropriate (consider keeping `editable`). - a `DatabaseServerCredential` model containing a DB hostname, port, and username; and - a `UserDatabaseMap` model containing a `user_id` fkey to the `User` model, a `database_id` fkey to the `Database` model, and a `database_server_credential_id` fkey to the `DatabaseServerCredential` model. The `UserDatabaseMap` model should additionally have a `database` field, and the `user_id`, `database` pair should be unique. diff --git a/docs/engineering/architecture/permissions.md b/docs/engineering/architecture/permissions.md index 98b395e52..89c549373 100644 --- a/docs/engineering/architecture/permissions.md +++ b/docs/engineering/architecture/permissions.md @@ -2,26 +2,29 @@ The big picture is that we will implement access management for DB objects (including databases themselves) by giving admin users of Mathesar the ability to use database-layer permissions management tools (e.g., `GRANT`). For web service resources (e.g., Django model instances), permissions will be managed in Django. -## Database Connection Model +## Database Model -As described in [the models section](./models.md), This will store the information needed to create an engine and connect to a database. Because the information includes a role (to define the initial connection role), it necessarily defines a set of privileges available on the database with that connection. +This model will be reduced to store nothing more than metadata about a given database, as well as its name for use when constructing an actual connection to that database. -## Adding Connections (backend perspective) +## Database Server Credential Model -Regardless of UI, the backend should receive a `POST` request defining a new connection. It should store this in the `DatabaseConnection` model, and initially no users should have access to that. Then, the backend should receive `PATCH` requests to modify various users, giving them access to use that connection. Note that this does _not_ directly modify anything to do with permissions on actual database objects (e.g., schemata or tables). I don't think our "Manager", "Editor", "Viewer" framework makes sense here. You have two levels: "Admin" (a proper Mathesar superuser), and "User" (a user who's allowed to use a given connection). +As described in [the models section](./models.md), This will store the authentication information needed to create an engine, but won't provide a database (a required part of a connection definition in PostgreSQL). Because the information includes a role (to define the initial connection role), it necessarily defines a set of privileges available on the database with that connection. -!!! question "UX Question" - Is there a use-case for having a non-admin "Manager" role who can grant access to a given connection string, but not others? +## User Database Map Model + +This map uses a `user_id, database` pair to look up the needed credentials, then provides a connection to the database using those credentials (if possible). + +## Adding Connections (backend perspective) + +Regardless of UI, the backend should receive a `POST` request to the new RPC endpoint defining a new connection. We'll use the same functions set up in [PR \#3348](https://github.com/mathesar-foundation/mathesar/pull/3348). The relevant info should be stored in the `Database` and `DatabaseServerCredential` models. Also, if the user does not already have an entry defining their role on the given database, we could create such an entry automatically in the `UserDatabaseMap` model. This is optional, and doesn't really affect the architecture. Then, to let some other users access that connection, we will provide an RPC function that lets an admin set different users' credentials for the given database by creating or updating `UserDatabaseMap` resources. Note that this does _not_ directly modify anything to do with permissions on actual database objects (e.g., schemata or tables). ## Granting database object privileges -The backend will provide an endpoint that lets an admin (who has access to a sufficiently-privileged role via a connection) access database-level permission granting functionality directly. So, to grant access to create tables in a schema to a Mathesar user, the admin needs to send a `PATCH` request to CRUD endpoint that defines a privilege-granting query (via a function) on the database. Note that this `PATCH` request doesn't actually modify any model instance. We should probably namespace this endpoint differently than the Mathesar user admin for clarity. +The backend will provide RPC functions that let an admin (who has access to a sufficiently-privileged database role via a connection) access database-level permission granting functionality directly. So, to grant access to create tables in a schema to a Mathesar user, the admin uses an RPC function that defines a privilege-granting query (via a PL/pgSQL function) on the database. Note that this request doesn't actually modify any model instance. - Privileges on DB objects are thus not granted directly to Mathesar users. - Privileges on DB objects are granted to DB roles, which may be accessible to some (or all) Mathesar users. - -!!! question "UX Question" - Should access to Connections and access to DB objects via connections be presented the same way in the UI? +- Anytime an RPC function requiring DB access runs, it runs using the connection defined via the `UserDatabaseMap`, with the associated privileges on that database. !!! question "UX Question" Should the admin think in terms of groups of Mathesar users, or specifically in terms of connections when dealing with DB-level privileges? @@ -38,7 +41,7 @@ There are some metadata objects for which privileges shouldn't really be granted ## Multi-user metadata object privileges -Examples of these are table properties like preview columns. We've decided these should be per-table, and so this metadata will be stored on the user DB in a `msar_cat` namespace. Thus, modifying this metadata will occur in a request to modify the associated table, and the privilege check will be whether the relevant DB user is allowed to `ALTER` that table. Viewing this metadata would check whether the relevant DB user is allowed to `SELECT` that table. +Examples of these are table properties like preview columns. We've decided these should be per-table, and so this metadata will be stored on the user DB in a `msar_cat` namespace. Thus, modifying this metadata will occur in a request to modify the associated table, and the privilege check will be whether the relevant DB user is allowed to `ALTER` that table. ## Standalone Mathesar objects From ebe4b03174b8dc617776ad207e929661de0b43c1 Mon Sep 17 00:00:00 2001 From: Brent Moran Date: Mon, 19 Feb 2024 14:36:08 -0600 Subject: [PATCH 09/24] update permissions on connections with derived exploration permissions --- docs/engineering/architecture/permissions.md | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/docs/engineering/architecture/permissions.md b/docs/engineering/architecture/permissions.md index 89c549373..5f8421928 100644 --- a/docs/engineering/architecture/permissions.md +++ b/docs/engineering/architecture/permissions.md @@ -35,20 +35,19 @@ The backend will provide RPC functions that let an admin (who has access to a su - `GRANT` the owning DB role (ostensibly a user) to DB users connectable by the relevant Mathesar users. - Have at least one DB superuser available for use with a connection, and give relevant Mathesar users access to that DB superuser. -## Single-user metadata object privileges - -There are some metadata objects for which privileges shouldn't really be granted or revoked. An example is the display options for a column. For these types of objects, we should simply keep an 'owning user' attribute of each model instance, and use that to apply the correct version of the metadata when returning object info from the database. Note that if we end up exposing an endpoint to work on these metadata models directly, then we'll need access policies. - ## Multi-user metadata object privileges Examples of these are table properties like preview columns. We've decided these should be per-table, and so this metadata will be stored on the user DB in a `msar_cat` namespace. Thus, modifying this metadata will occur in a request to modify the associated table, and the privilege check will be whether the relevant DB user is allowed to `ALTER` that table. ## Standalone Mathesar objects -An example of this would be an Exploration. The privileges needed to actually _run_ the query are handled on the database as described above, but a user should have access to look at the exploration definition itself (as well as edit/delete) it handled by the Django access policy framework. It seems like our current Manager/Editor/Viewer framework should suffice, but needs to be applied directly to such objects. +An example of this would be an Exploration. The privileges needed to actually _run_ the query are handled on the database as described above, but a user should have access to look at the exploration definition itself (as well as edit/delete) handled by the Django access policy framework. It seems like our current Manager/Editor/Viewer framework should suffice. -!!! question "UX Question" - How should viewing others' objects work for these? Should they be namespaced under some "workspace" concept (currently we're using database/schema)? +In the case of Explorations (UIQuery model), this will be derived from a policy scoped via the `Database` associated with the Exploration: +- A super user of Mathesar will set the policy for `DatabaseServerCredential` instances via the UI. + - Each credential instance represents a Role on the DB. + - When a User wants to view/edit/manage an Exploration, the web service will check the `user, database` pair (where `database`) is the Database associated with the Exploration to get a `DatabaseServerCredential` if one exists (otherwise, no permissions are granted) + - Based on the policy applied to that credential, the user can then act on the Exploration. ## Shared links From b046bcbebda4ff82952d6bd6c226a82e0cb61183 Mon Sep 17 00:00:00 2001 From: Brent Moran Date: Mon, 19 Feb 2024 14:37:16 -0600 Subject: [PATCH 10/24] dispatch TODOs, tidy up models page --- docs/engineering/architecture/index.md | 8 +------- docs/engineering/architecture/models.md | 5 ++--- 2 files changed, 3 insertions(+), 10 deletions(-) diff --git a/docs/engineering/architecture/index.md b/docs/engineering/architecture/index.md index b58ffe205..a9459d346 100644 --- a/docs/engineering/architecture/index.md +++ b/docs/engineering/architecture/index.md @@ -125,9 +125,7 @@ This service should provide an JSON-RPC API for use by the front end. When an AP ### Example -**TODO** Clean this up and check it. - -A user lists the columns for a table. Because they have access to read the columns of the table (checked on the DB), they can read (their) display options for a table. If they have access to modify a column of a table, they have access to modify the relevant display options. This works as long as there isn't a dedicated `display_options` endpoint which could receive requests directly. +A user lists the columns for a table. Because they have access to read the columns of the table (checked on the DB), they can read display options for a table. If they have access to modify a column of a table, they have access to modify the relevant display options. This works as long as there isn't a dedicated `display_options` endpoint which could receive requests directly. ### Exceptions @@ -138,7 +136,3 @@ There are some metadata and other models that we'll be keeping which _can_ recei - Explorations Access to these will be managed using the Django permissions framework (i.e., with access policies). - -### We should avoid - -We should absolutely avoid caching or storing permissions and access to Django models based on permissions for underlying DB models. For example, if we ever want a user to manage display options directly, they should own _any_ display option they've created (and have access to it) regardless of their access (or lack of it) to the underlying DB object. diff --git a/docs/engineering/architecture/models.md b/docs/engineering/architecture/models.md index 8632581ea..0a11bc94c 100644 --- a/docs/engineering/architecture/models.md +++ b/docs/engineering/architecture/models.md @@ -1,9 +1,8 @@ # Models -!!! warning "Warning" - This document is still a WIP. Mostly still in 'ad-hoc notes' format +## Current Models (skipping unused batteries for now) -## Models (skipping unused batteries for now) +This section contains ad-hoc notes on our current models, and intended changes. ### Column From cd6e38b12b49f41f299f27440e092152bfb2f7be Mon Sep 17 00:00:00 2001 From: Brent Moran Date: Mon, 11 Mar 2024 15:11:21 -0600 Subject: [PATCH 11/24] Add new model setup section --- docs/engineering/architecture/models.md | 241 ++++++++++++++++++++++-- 1 file changed, 228 insertions(+), 13 deletions(-) diff --git a/docs/engineering/architecture/models.md b/docs/engineering/architecture/models.md index 0a11bc94c..1fb3419e3 100644 --- a/docs/engineering/architecture/models.md +++ b/docs/engineering/architecture/models.md @@ -23,7 +23,6 @@ We need to replace functionality to get `ui_type` from DB type. To replace the dependent-getting functionality, we need to move the dependents module to SQL. - ### Constraint | Column | Type | @@ -56,15 +55,7 @@ Stores connection info to allow accessing a DB by creating an SQLAlchemy engine. Referenced by DatabaseRole and Schema models. -Replace this with these models: - -- a `Database` model containing (for now) only an ID, an OID, and a `database` (name). We could also add other metadata to this model as appropriate (consider keeping `editable`). -- a `DatabaseServerCredential` model containing a DB hostname, port, and username; and -- a `UserDatabaseMap` model containing a `user_id` fkey to the `User` model, a `database_id` fkey to the `Database` model, and a `database_server_credential_id` fkey to the `DatabaseServerCredential` model. The `UserDatabaseMap` model should additionally have a `database` field, and the `user_id`, `database` pair should be unique. - -We should eventually add functionality to store some details in a [`.pgpass`](https://www.postgresql.org/docs/current/libpq-pgpass.html) dotfile (though probably in a custom location). `psycopg` can inject the password automatically through these means. - -The Django permissions infrastructure should handle CRUD operations on `DatabaseServerCredential` and `UserDatabaseMap` resources. Actually accessing a database wouldn't require the permissions infrastructure; we'd instead construct a connection string by joining the appropriate `database` to the other info found by looking up the `user_id, database` pair. +Replace this with `Database`, `DatabaseServer`, `DatabaseServerCredential`, and `UserDatabaseMap` models. See the [New models](#new-model-setup) for details. ### DatabaseRole @@ -77,7 +68,7 @@ The Django permissions infrastructure should handle CRUD operations on `Database | database\_id | integer | | user\_id | integer | -This stores a role on a given database for a given user. We should delete this model, and instead have a many-to-many mapping between Mathesar users and database connections. All actual access should be handled by underlying DB permissions for the underlying user. +This stores a role on a given database for a given user. We will repurpose this, and it will be applied (for now) only to `UIQuery` instances namespaced under a given database. ### DataFile @@ -157,7 +148,7 @@ This should be deleted, and the permissions should be instead managed on the und | enabled | boolean | | query\_id | integer | -This model should stay. No changes here. We need to add metadata about which user created the share, so we can use that user's allowed DB credentials for accessing the underlying assets (tables, schemata, etc.) +This model should stay. No changes here. We need to add metadata about a credential for running the actual query. ### SharedTable @@ -241,4 +232,228 @@ This stores a definition of a stored query that can be run on command. The main | short\_name | character varying(255) | | password\_change\_needed | boolean | -This stores user metadata. I think we should mostly keep it as is. It will be referenced by the `DatabaseServerCredential` model. +This stores user metadata. I think we should mostly keep it as is. It will be referenced by the `UserDatabaseMap` model. + +## New Model setup + +Some of this is tentative pending decisions in our Users and Permissions work. We should be able to handle anything being discussed through simple extensions of this model framework. Also, these models are intended to get us to beta, while providing flexibility to move forward afterwards. There will be discussion of a desired next iteration at the end. + +### User + +| Column | Type | Notes | +|--------------------------|--------------------------|--------| +| id | integer | pkey | +| password | character varying(128) | | +| last\_login | timestamp with time zone | | +| is\_superuser | boolean | | +| username | character varying(150) | unique | +| email | character varying(254) | | +| is\_staff | boolean | | +| is\_active | boolean | | +| date\_joined | timestamp with time zone | | +| full\_name | character varying(255) | | +| short\_name | character varying(255) | | +| password\_change\_needed | boolean | | + +### DatabaseServer + +| Column | Type | Notes | +|-------------|--------------------------|--------------------------------| +| id | integer | pkey | +| created\_at | timestamp with time zone | | +| updated\_at | timestamp with time zone | | +| host | character varying | | +| port | | | + +`(host, port)` pair is unique. + +Theoretically, we could also split the host out, but that seems like premature optimization. + +### Database + +| Column | Type | Notes | +|---------------|--------------------------|-------------------------------| +| id | integer | pkey | +| created\_at | timestamp with time zone | | +| updated\_at | timestamp with time zone | | +| oid | integer | | +| db\_name | text | | +| display\_name | text | unique | +| db\_server | integer | references DatabaseServer(id) | +| editable | boolean | | + +`(db_server, oid)` pair is unique + +### DatabaseServerCredential + +| Column | Type | Notes | +|-------------|--------------------------|-----------------------------------------| +| id | integer | pkey | +| created\_at | timestamp with time zone | | +| updated\_at | timestamp with time zone | | +| username | character varying | | +| password | character varying | encrypted | +| db\_server | integer | optional; references DatabaseServer(id) | + +The `db_server` field would add a bit of security and firmly attach a role to a given database server. It's debatable whether we should do this or not, and depends on desired UX. The advantage would be to make sure that whenever a `DatabaseServer` is removed from Mathesar, any associated credentials are also removed. The downside is reduced flexibility: The detached version allows for management of a common username and password across multiple servers (this may be desired). The author prefers having this foreign key in the model. + +### UserDatabaseMap + +| Column | Type | Notes | +|-------------|--------------------------|-----------------------------------------| +| id | integer | pkey | +| created\_at | timestamp with time zone | | +| updated\_at | timestamp with time zone | | +| user | integer | references User(id) | +| database | integer | references Database(id) | +| role | integer | references DatabaseServerCredential(id) | + +`(user, database)` pair is unique. + +### Aside: Quick overview of connecting to a DB. + +The Django permissions infrastructure should handle CRUD operations on `Database`, `DatabaseServerCredential`, `DatabaseServer`, and `UserDatabaseMap` resources. Actually accessing a database wouldn't require the permissions infrastructure; we'd instead construct a connection string by joining the appropriate `database` to the other info found by looking up the `user_id, database` pair. For example, given a `(user, database)` pair like `(3, 8)`, we'd look up the appropriate row in the `UserDatabaseMap` model to find the `role` (referencing `DatabaseServerCredential`). We also follow the foreign key to the `Database` to pick up the `db_name` and then the foreign key to `DatabaseServer` to pick up the `host` and `port`. + +We should eventually add functionality to store some details in a [`.pgpass`](https://www.postgresql.org/docs/current/libpq-pgpass.html) dotfile (though probably in a custom location). `psycopg` can inject the password automatically through these means. + +### UIQuery + +| Column | Type | Notes | +|------------------|--------------------------|-------| +| id | integer | pkey | +| created\_at | timestamp with time zone | | +| updated\_at | timestamp with time zone | | +| database\_id | integer | | +| base\_table\_oid | integer | | +| name | character varying(128) | | +| description | text | | +| initial\_columns | jsonb | | +| transformations | jsonb | | +| display\_options | jsonb | | +| display\_names | jsonb | | + +- The JSONB columns are the same format, except now they refer to DB-layer ids, e.g., OIDs and attnums rather than Django-layer IDs. +- We should consider changing `display_options` to refer to instances of `ColumnMetadata` within the JSONB +- Permissions on this object will be derived from the `DatabaseUIQueryRole` via the `(database_id, user)` pair. + +### DatabaseUIQueryRole (consider different name) + +| Column | Type | Notes | +|--------------|--------------------------|-------------------------| +| id | integer | pkey | +| created\_at | timestamp with time zone | | +| updated\_at | timestamp with time zone | | +| role | character varying(10) | | +| database\_id | integer | references Database(id) | +| user\_id | integer | references User(id) | + +This will be used to store Manager, Editor, Viewer permissions on Explorations. The reason to keep this (derived from the current `DatabaseRole`) for now is to give us a namespace where we can organize permissions for `UIQuery` instances (Explorations in the current UI). If we want to use "Projects" for such a namespace, or derive these roles from the `DatabaseServerCredential` associated with a user on a `Database` via the `UserDatabaseMap`, then we will have to change or remove this model. More discussion needed here. + +### ColumnMetadata + +| Column | Type | Notes | +|-------------------------|--------------------------|-----------------------------------| +| id | integer | pkey | +| created\_at | timestamp with time zone | | +| updated\_at | timestamp with time zone | | +| database\_id | integer | References Database(id) | +| table\_oid | integer | | +| attnum | integer | | +| bool\_input | enum | ('dropdown', 'checkbox') | +| bool\_true | text | default: 'True' | +| bool\_false | text | default: 'False' | +| num\_min\_frac\_digits | integer | min: 0, max: 20 | +| num\_max\_frac\_digits | integer | min: 0, max: 20 | +| num\_show\_as\_perc | boolean | Default: false | +| mon\_currency\_symbol | text | Default? | +| mon\_currency\_location | enum | ('after-minus', 'end-with-space') | +| time\_format | text | | +| date\_format | text | | +| duration\_min | character varying(255) | | +| duration\_max | character varying(255) | | +| duration\_show\_units | boolean | | + +- The `(database_id, table_oid, attnum)` tuple should be unique. +- Depending on Django's support for multicolumn `CHECK` constraints, we should ensure that `num_min_frac_digits < num_max_frac_digits`. +- This has a number of fields to replace the current JSON storage of display options, and remove the need for the polymorphic serializer. +- The only foreign key we reference is the `Database(id)`, needed to map to a specific database where we find the relevant table and column. +- We don't need to reference any `schema_oid`, since a `(table_oid, attnum)` pair is unique per DB. +- Permissions to manipulate instances of this model would be derived from permissions to manipulate the relevant table and column in the underlying database. + +### TableMetadata + +| Column | Type | Notes | +|---------------------|--------------------------|-------------------------| +| id | integer | pkey | +| created\_at | timestamp with time zone | | +| updated\_at | timestamp with time zone | | +| database\_id | integer | references Database(id) | +| table\_oid | integer | | +| import\_verified | boolean | | +| is\_temp | boolean | | +| import\_target\_oid | integer | | +| column\_order | jsonb | | +| preview\_customized | boolean | | +| preview\_template | character varying(255) | | + +I've left the preview template in the Mathesar layer. The hope is that we can find a sufficiently featureful and also sufficiently efficient algorithm for getting the template, thereby avoiding needing to move this down into the user Database. There will be more discussion of this below. Permissions to manipulate this should be derived from permissions on the relevant table in the underlying database. + +### DataFile + +| Column | Type | +|-------------------------|--------------------------| +| id | integer | +| created\_at | timestamp with time zone | +| updated\_at | timestamp with time zone | +| file | character varying(100) | +| created\_from | character varying(128) | +| base\_name | character varying(100) | +| header | boolean | +| delimiter | character varying(1) | +| escapechar | character varying(1) | +| quotechar | character varying(1) | +| user\_id | integer | +| type | character varying(128) | +| max\_level | integer | +| sheet\_index | integer | + +When we have our desired logic for cleaning this up sorted out, we should consider removing this model. It's currently only used ephemerally, but then the actuaul instance hangs around indefinitely. + +### SharedQuery + +| Column | Type | Notes | +|-------------|--------------------------|-----------------------------------------| +| id | integer | pkey | +| created\_at | timestamp with time zone | | +| updated\_at | timestamp with time zone | | +| slug | uuid | unique | +| enabled | boolean | | +| query\_id | integer | | +| credential | integer | references DatabaseServerCredential(id) | + +I've chosen to store the credential id, rather than the creating user, for flexibility. We can derive this from a creating user at the time the query is created, and could (theoretically) update it if the User's credential for a given DB changes (I wouldn't recommend this). + +### SharedTable + +| Column | Type | Notes | +|-------------|--------------------------|-----------------------------------------| +| id | integer | pkey | +| created\_at | timestamp with time zone | | +| updated\_at | timestamp with time zone | | +| slug | uuid | unique | +| enabled | boolean | | +| table\_oid | integer | | +| credential | integer | references DatabaseServerCredential(id) | + +## After-beta-term vision + +For the beta, I'm hoping to avoid some work by keeping things in the Mathesar service models that I'd rather store in the underlying User Databases in a `msar_catalog` schema. The relevant models are `ColumnMetadata` and `TableMetadata`. A big motivation to move this info to the User DB is performance w.r.t. the table previews. Our current algorithm requires lots of back-and-forth between the service layer and the User DB in order to recursively build these preview templates, and to fill them. I also think it's more natural to keep these metadata models in the User DB, since they're segregated by User DB, and each instance only refers to objects on that underlying database. + +I also think in the even longer term that we should think about storing our `UIQuery` info on the underlying database in the form of views (perhaps in a special `msar_queries` schema). This presents some technical problems, however, that we haven't yet solved. + +### What about names vs. OIDs? + +I thought about adding another model to store a general map of names to OIDs for use when resolving missing tables, etc. This would be useful if someone drops and recreates a table, or when trying to export your Mathesar Explorations or Display Settings. I didn't add that at this stage, since: + +- We'd use the underlying User DB for that map if we move the Metadata models down to the UserDB, and +- We aren't prioritizing the features requiring being able to export and reimport your Explorations for beta. From ce45578f41b04f230faa65d69fcbaa101e20c0e6 Mon Sep 17 00:00:00 2001 From: Brent Moran Date: Mon, 11 Mar 2024 15:25:03 -0600 Subject: [PATCH 12/24] add detail to main Architecture page --- docs/engineering/architecture/index.md | 36 ++++++++++++++++++++------ 1 file changed, 28 insertions(+), 8 deletions(-) diff --git a/docs/engineering/architecture/index.md b/docs/engineering/architecture/index.md index a9459d346..bca95133e 100644 --- a/docs/engineering/architecture/index.md +++ b/docs/engineering/architecture/index.md @@ -12,14 +12,24 @@ We need a couple of terms to get started: ## Bird's Eye View -The main goals of this architectural redesign are to improve the speed and reduce the complexity of the back end. Secondary goals are to improve the convenience of the API for the use case of our front end, and make it easier for users and contributors to identify which back end code supports a given front end feature. We plan to do this by: +The main goals of this architectural redesign are to + +- improve the speed of the back end, and +- reduce the complexity of the back end. + +Secondary goals are to + +- improve the convenience of the API for the use case of our front end, and +- make it easier for users and contributors to identify which back end code supports a given front end feature. + +We plan to do this by: - Removing Django models representing User DB objects (e.g., tables), replacing them with functions that query and act on those DB objects directly. Insofar as we need to enrich the metadata about User DB objects with Mathesar-managed metadata, we'll do that after gathering the relevant info from the User DB. - Changing our API to use a JSON-RPC (2.0) spec. ## A Motivating Example: Getting the table info in a schema -To get the table info in a schema, +To get the table info in a schema using our current architecture, 1. The front end calls the endpoint `GET /api/db/v0/tables/` using a query string parameter to filter results from that endpoint based on a schema (identified by a Django-assigned integer id). Internally, then the following happens: @@ -72,7 +82,7 @@ With the new architecture, to get the same info, - `type_options` - `preview_settings` -- describes how we should show each table's rows when it's linked to by a foreign key -1. Using the returned info, the web service filters a `TableMetadata` model based on returned `oid`s, and gathers +1. Using the returned info, the web service filters a `TableMetadata` model based on the passed `database` and returned `oid`s, and gathers - `import_verified` - `is_temp` @@ -100,8 +110,18 @@ For each API call, there should be an identifiable DB function that performs all To achieve this, we will install the following on the user database(s) -- Some custom Mathesar types. -- Some tables holding metadata that needs to be kept near the tables. +- Some custom Mathesar types. These are used to validate passed JSON at the User DB level. For example, we create the type + ``` + TYPE __msar.col_def AS ( + name_ text, -- The name of the column to create, quoted. + type_ text, -- The type of the column to create, fully specced with arguments. + not_null boolean, -- A boolean to describe whether the column is nullable or not. + default_ text, -- Text SQL giving the default value for the column. + identity_ boolean, -- A boolean giving whether the column is an identity pkey column. + description text -- A text that will become a comment for the column + ) + ``` + This type describes and validates the column info we need to add that column to a table. - A set of functions that provide the bulk of Mathesar's back end logic and functionality. ### Python `db` library @@ -112,7 +132,7 @@ This library should mostly serve to provide thin wrapper functions around the Us This service should provide an JSON-RPC API for use by the front end. When an API function is called, the service should: -- Grab an appropriate engine from the `DatabaseConnection` model by combining the user associated with the request with the `database`. +- Grab an appropriate engine from the by combining the `user` associated with the request with the `database`. See the [models](models.md) page for more detail. - Call the relevant `db` library function (should be only one in most cases). - Gathers data from the service database via models if needed (this is for metadata that's inappropriate for storage in the User DB for some reason) - Returns it to the API @@ -125,13 +145,13 @@ This service should provide an JSON-RPC API for use by the front end. When an AP ### Example -A user lists the columns for a table. Because they have access to read the columns of the table (checked on the DB), they can read display options for a table. If they have access to modify a column of a table, they have access to modify the relevant display options. This works as long as there isn't a dedicated `display_options` endpoint which could receive requests directly. +A user lists the columns for a table. Because they have access to read the columns of the table (checked on the DB), they can read display options for a table. If they have access to modify a column of a table, they have access to modify the relevant display options. This works as long as there isn't a dedicated `display_options` endpoint which could receive requests directly. Even in that case, we could add logic to check permissions on the relevant User DB. ### Exceptions There are some metadata and other models that we'll be keeping which _can_ receive direct requests. Currently, these are: -- Database connections +- Database connection and credential info - Shareable links - Explorations From 254f9a30f46ddc269e7b7b03b54ce5c4939706f3 Mon Sep 17 00:00:00 2001 From: Brent Moran Date: Mon, 11 Mar 2024 15:39:01 -0600 Subject: [PATCH 13/24] update permissions description with new models --- docs/engineering/architecture/permissions.md | 27 ++++++++++++++++---- 1 file changed, 22 insertions(+), 5 deletions(-) diff --git a/docs/engineering/architecture/permissions.md b/docs/engineering/architecture/permissions.md index 5f8421928..87fbfaeb0 100644 --- a/docs/engineering/architecture/permissions.md +++ b/docs/engineering/architecture/permissions.md @@ -6,6 +6,10 @@ The big picture is that we will implement access management for DB objects (incl This model will be reduced to store nothing more than metadata about a given database, as well as its name for use when constructing an actual connection to that database. +## Database Server Model + +This model holds the `host` and `port` information for a given database. + ## Database Server Credential Model As described in [the models section](./models.md), This will store the authentication information needed to create an engine, but won't provide a database (a required part of a connection definition in PostgreSQL). Because the information includes a role (to define the initial connection role), it necessarily defines a set of privileges available on the database with that connection. @@ -16,7 +20,7 @@ This map uses a `user_id, database` pair to look up the needed credentials, then ## Adding Connections (backend perspective) -Regardless of UI, the backend should receive a `POST` request to the new RPC endpoint defining a new connection. We'll use the same functions set up in [PR \#3348](https://github.com/mathesar-foundation/mathesar/pull/3348). The relevant info should be stored in the `Database` and `DatabaseServerCredential` models. Also, if the user does not already have an entry defining their role on the given database, we could create such an entry automatically in the `UserDatabaseMap` model. This is optional, and doesn't really affect the architecture. Then, to let some other users access that connection, we will provide an RPC function that lets an admin set different users' credentials for the given database by creating or updating `UserDatabaseMap` resources. Note that this does _not_ directly modify anything to do with permissions on actual database objects (e.g., schemata or tables). +Regardless of UI, the backend should receive a `POST` request to the new RPC endpoint defining a new connection. We'll use the same functions set up in [PR \#3348](https://github.com/mathesar-foundation/mathesar/pull/3348). The relevant info should be stored in the `Database`, `DatabaseServer`, and `DatabaseServerCredential` models. Also, if the user does not already have an entry defining their role on the given database, we could create such an entry automatically in the `UserDatabaseMap` model. This is optional, and doesn't really affect the architecture. Then, to let some other users access that connection, we will provide an RPC function that lets an admin set different users' credentials for the given database by creating or updating `UserDatabaseMap` resources. Note that this does _not_ directly modify anything to do with permissions on actual database objects (e.g., schemata or tables). ## Granting database object privileges @@ -35,23 +39,36 @@ The backend will provide RPC functions that let an admin (who has access to a su - `GRANT` the owning DB role (ostensibly a user) to DB users connectable by the relevant Mathesar users. - Have at least one DB superuser available for use with a connection, and give relevant Mathesar users access to that DB superuser. -## Multi-user metadata object privileges +## Metadata object privileges -Examples of these are table properties like preview columns. We've decided these should be per-table, and so this metadata will be stored on the user DB in a `msar_cat` namespace. Thus, modifying this metadata will occur in a request to modify the associated table, and the privilege check will be whether the relevant DB user is allowed to `ALTER` that table. +Examples of these are table properties like preview columns. Permissions for these are derived from permissions on relevant User DB objects. So, if a user wants to modify the preview template for a table, they would be checked against the permission to modify the underlying table structure. Thus, modifying this metadata will occur in a request to modify the associated table, and the privilege check will be whether the relevant DB user is allowed to `ALTER` that table. ## Standalone Mathesar objects An example of this would be an Exploration. The privileges needed to actually _run_ the query are handled on the database as described above, but a user should have access to look at the exploration definition itself (as well as edit/delete) handled by the Django access policy framework. It seems like our current Manager/Editor/Viewer framework should suffice. -In the case of Explorations (UIQuery model), this will be derived from a policy scoped via the `Database` associated with the Exploration: +### Current plan + +In the case of Explorations (`UIQuery` model), this will be derived from a policy scoped via the `Database` associated with the Exploration: + +- A super user of Mathesar will set the policy for a given `User` on a given `Database` instance via the UI. + - When a `User` wants to view/edit/manage an Exploration, the web service will check the `user, database` pair (where `database`) is the `Database` associated with the Exploration to find a relevant policy. + - Based on the policy, the user can then act on the Exploration. + +### Alternative plan + +In the case of Explorations (`UIQuery` model), this will be derived from a policy scoped via the `Database` associated with the Exploration: + - A super user of Mathesar will set the policy for `DatabaseServerCredential` instances via the UI. - Each credential instance represents a Role on the DB. - When a User wants to view/edit/manage an Exploration, the web service will check the `user, database` pair (where `database`) is the Database associated with the Exploration to get a `DatabaseServerCredential` if one exists (otherwise, no permissions are granted) - Based on the policy applied to that credential, the user can then act on the Exploration. +This plan would require a minor change to the [models](models.md), but isn't very difficult to implement. The author considers it a UX question which way we go on this. + ## Shared links -A shared table or exploration needs access to a `DatabaseConnection` to run (or be viewed). We should bypass any permission checks when actually retrieving data. The shared object model should include an attribute giving the `DatabaseConnection` instance needed to run (or view). +A shared table or exploration needs access to a `Database` and `DatabaseServerCredential` to run (or be viewed). We should bypass any permission checks when actually retrieving data. The shared object model should include an attribute giving the `DatabaseServerCredential` instance needed to run (or view). !!! danger "Tech/Product concern" The safest (and easiest) implementation would be to have specialized view-only users who are `GRANT`ed `SELECT` on relevant DB objects when needed. Then, sharable links would _only_ use those users. But, we'll need to justify this choice in documentation somewhere. From 938f168e1af57221fb25c1d22a615bc3d76bb6cd Mon Sep 17 00:00:00 2001 From: Brent Moran Date: Tue, 26 Mar 2024 17:00:13 -0500 Subject: [PATCH 14/24] remove _id where it occurs, note not null in models --- docs/engineering/architecture/models.md | 190 ++++++++++++------------ 1 file changed, 96 insertions(+), 94 deletions(-) diff --git a/docs/engineering/architecture/models.md b/docs/engineering/architecture/models.md index 1fb3419e3..598269548 100644 --- a/docs/engineering/architecture/models.md +++ b/docs/engineering/architecture/models.md @@ -240,49 +240,51 @@ Some of this is tentative pending decisions in our Users and Permissions work. W ### User -| Column | Type | Notes | -|--------------------------|--------------------------|--------| -| id | integer | pkey | -| password | character varying(128) | | -| last\_login | timestamp with time zone | | -| is\_superuser | boolean | | -| username | character varying(150) | unique | -| email | character varying(254) | | -| is\_staff | boolean | | -| is\_active | boolean | | -| date\_joined | timestamp with time zone | | -| full\_name | character varying(255) | | -| short\_name | character varying(255) | | -| password\_change\_needed | boolean | | +| Column | Type | Notes | +|--------------------------|--------------------------|------------------| +| id | integer | pkey | +| password | character varying(128) | not null | +| last\_login | timestamp with time zone | | +| is\_superuser | boolean | | +| username | character varying(150) | not null; unique | +| email | character varying(254) | | +| is\_staff | boolean | | +| is\_active | boolean | | +| date\_joined | timestamp with time zone | | +| full\_name | character varying(255) | | +| short\_name | character varying(255) | | +| password\_change\_needed | boolean | | ### DatabaseServer -| Column | Type | Notes | -|-------------|--------------------------|--------------------------------| -| id | integer | pkey | -| created\_at | timestamp with time zone | | -| updated\_at | timestamp with time zone | | -| host | character varying | | -| port | | | +| Column | Type | Notes | +|-------------|--------------------------|----------| +| id | integer | pkey | +| created\_at | timestamp with time zone | | +| updated\_at | timestamp with time zone | | +| host | character varying | not null | +| port | integer | not null | `(host, port)` pair is unique. Theoretically, we could also split the host out, but that seems like premature optimization. +We could consider making the `host` and `port` nullable when we're supporting `.pgpass`. + ### Database -| Column | Type | Notes | -|---------------|--------------------------|-------------------------------| -| id | integer | pkey | -| created\_at | timestamp with time zone | | -| updated\_at | timestamp with time zone | | -| oid | integer | | -| db\_name | text | | -| display\_name | text | unique | -| db\_server | integer | references DatabaseServer(id) | -| editable | boolean | | +| Column | Type | Notes | +|---------------|--------------------------|-----------------------------------------| +| id | integer | pkey | +| created\_at | timestamp with time zone | | +| updated\_at | timestamp with time zone | | +| oid | integer | not null | +| db\_name | text | not null | +| display\_name | text | not null; unique | +| db\_server | integer | not null; references DatabaseServer(id) | +| editable | boolean | | -`(db_server, oid)` pair is unique +`(db_server, oid)` pair is unique, or `(db_server, db_name)` is. Should consider whether we should even store the `oid`, since we can't look up the `db_name` to connect without ... connecting, which requires the `db_name` rather than the `oid`. We could consider making `db_name` nullable when supporting `.pgpass` ### DatabaseServerCredential @@ -291,11 +293,11 @@ Theoretically, we could also split the host out, but that seems like premature o | id | integer | pkey | | created\_at | timestamp with time zone | | | updated\_at | timestamp with time zone | | -| username | character varying | | -| password | character varying | encrypted | -| db\_server | integer | optional; references DatabaseServer(id) | +| username | character varying | not null | +| password | character varying | encrypted; not null | +| db\_server | integer | not null; references DatabaseServer(id) | -The `db_server` field would add a bit of security and firmly attach a role to a given database server. It's debatable whether we should do this or not, and depends on desired UX. The advantage would be to make sure that whenever a `DatabaseServer` is removed from Mathesar, any associated credentials are also removed. The downside is reduced flexibility: The detached version allows for management of a common username and password across multiple servers (this may be desired). The author prefers having this foreign key in the model. +We could consider making `username` and `password` nullable when supporting `.pgpass`. ### UserDatabaseMap @@ -312,40 +314,40 @@ The `db_server` field would add a bit of security and firmly attach a role to a ### Aside: Quick overview of connecting to a DB. -The Django permissions infrastructure should handle CRUD operations on `Database`, `DatabaseServerCredential`, `DatabaseServer`, and `UserDatabaseMap` resources. Actually accessing a database wouldn't require the permissions infrastructure; we'd instead construct a connection string by joining the appropriate `database` to the other info found by looking up the `user_id, database` pair. For example, given a `(user, database)` pair like `(3, 8)`, we'd look up the appropriate row in the `UserDatabaseMap` model to find the `role` (referencing `DatabaseServerCredential`). We also follow the foreign key to the `Database` to pick up the `db_name` and then the foreign key to `DatabaseServer` to pick up the `host` and `port`. +The Django permissions infrastructure should handle CRUD operations on `Database`, `DatabaseServerCredential`, `DatabaseServer`, and `UserDatabaseMap` resources. Actually accessing a database wouldn't require the permissions infrastructure; we'd instead construct a connection string by joining the appropriate `database` to the other info found by looking up the `user, database` pair. For example, given a `(user, database)` pair like `(3, 8)`, we'd look up the appropriate row in the `UserDatabaseMap` model to find the `role` (referencing `DatabaseServerCredential`). We also follow the foreign key to the `Database` to pick up the `db_name` and then the foreign key to `DatabaseServer` to pick up the `host` and `port`. We should eventually add functionality to store some details in a [`.pgpass`](https://www.postgresql.org/docs/current/libpq-pgpass.html) dotfile (though probably in a custom location). `psycopg` can inject the password automatically through these means. ### UIQuery -| Column | Type | Notes | -|------------------|--------------------------|-------| -| id | integer | pkey | -| created\_at | timestamp with time zone | | -| updated\_at | timestamp with time zone | | -| database\_id | integer | | -| base\_table\_oid | integer | | -| name | character varying(128) | | -| description | text | | -| initial\_columns | jsonb | | -| transformations | jsonb | | -| display\_options | jsonb | | -| display\_names | jsonb | | +| Column | Type | Notes | +|------------------|--------------------------|-------------------------| +| id | integer | pkey | +| created\_at | timestamp with time zone | | +| updated\_at | timestamp with time zone | | +| database | integer | references Database(id) | +| base\_table\_oid | integer | not null | +| name | character varying(128) | not null; unique | +| description | text | | +| initial\_columns | jsonb | not null | +| transformations | jsonb | | +| display\_options | jsonb | | +| display\_names | jsonb | | - The JSONB columns are the same format, except now they refer to DB-layer ids, e.g., OIDs and attnums rather than Django-layer IDs. - We should consider changing `display_options` to refer to instances of `ColumnMetadata` within the JSONB -- Permissions on this object will be derived from the `DatabaseUIQueryRole` via the `(database_id, user)` pair. +- Permissions on this object will be derived from the `DatabaseUIQueryRole` via the `(database, user)` pair. ### DatabaseUIQueryRole (consider different name) -| Column | Type | Notes | -|--------------|--------------------------|-------------------------| -| id | integer | pkey | -| created\_at | timestamp with time zone | | -| updated\_at | timestamp with time zone | | -| role | character varying(10) | | -| database\_id | integer | references Database(id) | -| user\_id | integer | references User(id) | +| Column | Type | Notes | +|-------------|--------------------------|-----------------------------------| +| id | integer | pkey | +| created\_at | timestamp with time zone | | +| updated\_at | timestamp with time zone | | +| role | character varying(10) | | +| database | integer | not null; references Database(id) | +| user | integer | not null; references User(id) | This will be used to store Manager, Editor, Viewer permissions on Explorations. The reason to keep this (derived from the current `DatabaseRole`) for now is to give us a namespace where we can organize permissions for `UIQuery` instances (Explorations in the current UI). If we want to use "Projects" for such a namespace, or derive these roles from the `DatabaseServerCredential` associated with a user on a `Database` via the `UserDatabaseMap`, then we will have to change or remove this model. More discussion needed here. @@ -356,9 +358,9 @@ This will be used to store Manager, Editor, Viewer permissions on Explorations. | id | integer | pkey | | created\_at | timestamp with time zone | | | updated\_at | timestamp with time zone | | -| database\_id | integer | References Database(id) | -| table\_oid | integer | | -| attnum | integer | | +| database | integer | not null; References Database(id) | +| table\_oid | integer | not null | +| attnum | integer | not null | | bool\_input | enum | ('dropdown', 'checkbox') | | bool\_true | text | default: 'True' | | bool\_false | text | default: 'False' | @@ -373,7 +375,7 @@ This will be used to store Manager, Editor, Viewer permissions on Explorations. | duration\_max | character varying(255) | | | duration\_show\_units | boolean | | -- The `(database_id, table_oid, attnum)` tuple should be unique. +- The `(database, table_oid, attnum)` tuple should be unique. - Depending on Django's support for multicolumn `CHECK` constraints, we should ensure that `num_min_frac_digits < num_max_frac_digits`. - This has a number of fields to replace the current JSON storage of display options, and remove the need for the polymorphic serializer. - The only foreign key we reference is the `Database(id)`, needed to map to a specific database where we find the relevant table and column. @@ -382,40 +384,40 @@ This will be used to store Manager, Editor, Viewer permissions on Explorations. ### TableMetadata -| Column | Type | Notes | -|---------------------|--------------------------|-------------------------| -| id | integer | pkey | -| created\_at | timestamp with time zone | | -| updated\_at | timestamp with time zone | | -| database\_id | integer | references Database(id) | -| table\_oid | integer | | -| import\_verified | boolean | | -| is\_temp | boolean | | -| import\_target\_oid | integer | | -| column\_order | jsonb | | -| preview\_customized | boolean | | -| preview\_template | character varying(255) | | +| Column | Type | Notes | +|---------------------|--------------------------|-----------------------------------| +| id | integer | pkey | +| created\_at | timestamp with time zone | | +| updated\_at | timestamp with time zone | | +| database | integer | not null; references Database(id) | +| table\_oid | integer | not null | +| import\_verified | boolean | | +| is\_temp | boolean | | +| import\_target\_oid | integer | | +| column\_order | jsonb | | +| preview\_customized | boolean | | +| preview\_template | character varying(255) | | I've left the preview template in the Mathesar layer. The hope is that we can find a sufficiently featureful and also sufficiently efficient algorithm for getting the template, thereby avoiding needing to move this down into the user Database. There will be more discussion of this below. Permissions to manipulate this should be derived from permissions on the relevant table in the underlying database. ### DataFile -| Column | Type | -|-------------------------|--------------------------| -| id | integer | -| created\_at | timestamp with time zone | -| updated\_at | timestamp with time zone | -| file | character varying(100) | -| created\_from | character varying(128) | -| base\_name | character varying(100) | -| header | boolean | -| delimiter | character varying(1) | -| escapechar | character varying(1) | -| quotechar | character varying(1) | -| user\_id | integer | -| type | character varying(128) | -| max\_level | integer | -| sheet\_index | integer | +| Column | Type | Notes | +|---------------|--------------------------|----------| +| id | integer | pkey | +| created\_at | timestamp with time zone | | +| updated\_at | timestamp with time zone | | +| file | character varying(100) | not null | +| created\_from | character varying(128) | | +| base\_name | character varying(100) | | +| header | boolean | | +| delimiter | character varying(1) | | +| escapechar | character varying(1) | | +| quotechar | character varying(1) | | +| user | integer | | +| type | character varying(128) | | +| max\_level | integer | | +| sheet\_index | integer | | When we have our desired logic for cleaning this up sorted out, we should consider removing this model. It's currently only used ephemerally, but then the actuaul instance hangs around indefinitely. @@ -428,7 +430,7 @@ When we have our desired logic for cleaning this up sorted out, we should consid | updated\_at | timestamp with time zone | | | slug | uuid | unique | | enabled | boolean | | -| query\_id | integer | | +| query | integer | not null; references UIQuery(id) | | credential | integer | references DatabaseServerCredential(id) | I've chosen to store the credential id, rather than the creating user, for flexibility. We can derive this from a creating user at the time the query is created, and could (theoretically) update it if the User's credential for a given DB changes (I wouldn't recommend this). @@ -442,7 +444,7 @@ I've chosen to store the credential id, rather than the creating user, for flexi | updated\_at | timestamp with time zone | | | slug | uuid | unique | | enabled | boolean | | -| table\_oid | integer | | +| table\_oid | integer | not null | | credential | integer | references DatabaseServerCredential(id) | ## After-beta-term vision From 5e1802f2938ae9a1c8f9e923f93a7be545e2d632 Mon Sep 17 00:00:00 2001 From: Brent Moran Date: Wed, 27 Mar 2024 12:13:52 -0500 Subject: [PATCH 15/24] remove role model, add field to role map to handle metadata --- docs/engineering/architecture/models.md | 42 +++++++++---------------- 1 file changed, 15 insertions(+), 27 deletions(-) diff --git a/docs/engineering/architecture/models.md b/docs/engineering/architecture/models.md index 598269548..d14581b37 100644 --- a/docs/engineering/architecture/models.md +++ b/docs/engineering/architecture/models.md @@ -55,7 +55,7 @@ Stores connection info to allow accessing a DB by creating an SQLAlchemy engine. Referenced by DatabaseRole and Schema models. -Replace this with `Database`, `DatabaseServer`, `DatabaseServerCredential`, and `UserDatabaseMap` models. See the [New models](#new-model-setup) for details. +Replace this with `Database`, `DatabaseServer`, `DatabaseServerCredential`, and `UserDatabaseRoleMap` models. See the [New models](#new-model-setup) for details. ### DatabaseRole @@ -232,7 +232,7 @@ This stores a definition of a stored query that can be run on command. The main | short\_name | character varying(255) | | password\_change\_needed | boolean | -This stores user metadata. I think we should mostly keep it as is. It will be referenced by the `UserDatabaseMap` model. +This stores user metadata. I think we should mostly keep it as is. It will be referenced by the `UserDatabaseRoleMap` model. ## New Model setup @@ -299,22 +299,23 @@ We could consider making the `host` and `port` nullable when we're supporting `. We could consider making `username` and `password` nullable when supporting `.pgpass`. -### UserDatabaseMap +### UserDatabaseRoleMap -| Column | Type | Notes | -|-------------|--------------------------|-----------------------------------------| -| id | integer | pkey | -| created\_at | timestamp with time zone | | -| updated\_at | timestamp with time zone | | -| user | integer | references User(id) | -| database | integer | references Database(id) | -| role | integer | references DatabaseServerCredential(id) | +| Column | Type | Notes | +|----------------|--------------------------|-----------------------------------------| +| id | integer | pkey | +| created\_at | timestamp with time zone | | +| updated\_at | timestamp with time zone | | +| user | integer | references User(id) | +| database | integer | references Database(id) | +| database\_role | integer | references DatabaseServerCredential(id) | +| metadata\_role | enum | ('read only', 'read write') | -`(user, database)` pair is unique. +`(user, database)` pair is unique. The `metadata_role` isn't likely to be technically implemented as an `enum` on the DB for now. We'll use a Django-managed `TextChoices` field to save implementation time. See the current `DatabaseRole` model and its interaction with the `Role` class for an example. ### Aside: Quick overview of connecting to a DB. -The Django permissions infrastructure should handle CRUD operations on `Database`, `DatabaseServerCredential`, `DatabaseServer`, and `UserDatabaseMap` resources. Actually accessing a database wouldn't require the permissions infrastructure; we'd instead construct a connection string by joining the appropriate `database` to the other info found by looking up the `user, database` pair. For example, given a `(user, database)` pair like `(3, 8)`, we'd look up the appropriate row in the `UserDatabaseMap` model to find the `role` (referencing `DatabaseServerCredential`). We also follow the foreign key to the `Database` to pick up the `db_name` and then the foreign key to `DatabaseServer` to pick up the `host` and `port`. +The Django permissions infrastructure should handle CRUD operations on `Database`, `DatabaseServerCredential`, `DatabaseServer`, and `UserDatabaseRoleMap` resources. Actually accessing a database wouldn't require the permissions infrastructure; we'd instead construct a connection string by joining the appropriate `database` to the other info found by looking up the `user, database` pair. For example, given a `(user, database)` pair like `(3, 8)`, we'd look up the appropriate row in the `UserDatabaseRoleMap` model to find the `role` (referencing `DatabaseServerCredential`). We also follow the foreign key to the `Database` to pick up the `db_name` and then the foreign key to `DatabaseServer` to pick up the `host` and `port`. We should eventually add functionality to store some details in a [`.pgpass`](https://www.postgresql.org/docs/current/libpq-pgpass.html) dotfile (though probably in a custom location). `psycopg` can inject the password automatically through these means. @@ -336,20 +337,7 @@ We should eventually add functionality to store some details in a [`.pgpass`](ht - The JSONB columns are the same format, except now they refer to DB-layer ids, e.g., OIDs and attnums rather than Django-layer IDs. - We should consider changing `display_options` to refer to instances of `ColumnMetadata` within the JSONB -- Permissions on this object will be derived from the `DatabaseUIQueryRole` via the `(database, user)` pair. - -### DatabaseUIQueryRole (consider different name) - -| Column | Type | Notes | -|-------------|--------------------------|-----------------------------------| -| id | integer | pkey | -| created\_at | timestamp with time zone | | -| updated\_at | timestamp with time zone | | -| role | character varying(10) | | -| database | integer | not null; references Database(id) | -| user | integer | not null; references User(id) | - -This will be used to store Manager, Editor, Viewer permissions on Explorations. The reason to keep this (derived from the current `DatabaseRole`) for now is to give us a namespace where we can organize permissions for `UIQuery` instances (Explorations in the current UI). If we want to use "Projects" for such a namespace, or derive these roles from the `DatabaseServerCredential` associated with a user on a `Database` via the `UserDatabaseMap`, then we will have to change or remove this model. More discussion needed here. +- Permissions on this object will be derived from the `UserDatabaseRoleMap.metadata_role` via the `(database, user)` pair. ### ColumnMetadata From 548f84f4bcc5785d27522d4b786e29ea79ce45a1 Mon Sep 17 00:00:00 2001 From: Brent Moran Date: Wed, 27 Mar 2024 13:20:36 -0500 Subject: [PATCH 16/24] Restructure to move old models to their own page --- docs/engineering/architecture/models.md | 262 +------------------- docs/engineering/architecture/old_models.md | 233 +++++++++++++++++ 2 files changed, 246 insertions(+), 249 deletions(-) create mode 100644 docs/engineering/architecture/old_models.md diff --git a/docs/engineering/architecture/models.md b/docs/engineering/architecture/models.md index d14581b37..435fa8e47 100644 --- a/docs/engineering/architecture/models.md +++ b/docs/engineering/architecture/models.md @@ -1,244 +1,8 @@ # Models -## Current Models (skipping unused batteries for now) - -This section contains ad-hoc notes on our current models, and intended changes. - -### Column - -| Column | Type | -|------------------|--------------------------| -| id | integer | -| created\_at | timestamp with time zone | -| updated\_at | timestamp with time zone | -| attnum | integer | -| display\_options | jsonb | -| table\_id | integer | - -The only actual info here is the display options for a given column, stored as a JSON blob. Rename to `ColumnMetadata`, restructure to validate display options, delete fkey fields. Consider moving to `ma_catalog` table on user DB. - -We need to handle updating the table preview template when a new column is added (or rethink the implementation of this functionality) - -We need to replace functionality to get `ui_type` from DB type. - -To replace the dependent-getting functionality, we need to move the dependents module to SQL. - -### Constraint - -| Column | Type | -|-------------|--------------------------| -| id | integer | -| created\_at | timestamp with time zone | -| updated\_at | timestamp with time zone | -| oid | integer | -| table\_id | integer | - -Nothing actually stored here. Delete this model. All functionality can be contained in User DB functions. - -### Database - -| Column | Type | -|-------------|--------------------------| -| id | integer | -| created\_at | timestamp with time zone | -| updated\_at | timestamp with time zone | -| name | character varying(128) | -| deleted | boolean | -| db\_name | character varying(128) | -| editable | boolean | -| host | character varying(255) | -| password | text | -| port | integer | -| username | text | - -Stores connection info to allow accessing a DB by creating an SQLAlchemy engine. - -Referenced by DatabaseRole and Schema models. - -Replace this with `Database`, `DatabaseServer`, `DatabaseServerCredential`, and `UserDatabaseRoleMap` models. See the [New models](#new-model-setup) for details. - -### DatabaseRole - -| Column | Type | -|--------------|--------------------------| -| id | integer | -| created\_at | timestamp with time zone | -| updated\_at | timestamp with time zone | -| role | character varying(10) | -| database\_id | integer | -| user\_id | integer | - -This stores a role on a given database for a given user. We will repurpose this, and it will be applied (for now) only to `UIQuery` instances namespaced under a given database. - -### DataFile - -| Column | Type | -|-------------------------|--------------------------| -| id | integer | -| created\_at | timestamp with time zone | -| updated\_at | timestamp with time zone | -| file | character varying(100) | -| created\_from | character varying(128) | -| base\_name | character varying(100) | -| header | boolean | -| delimiter | character varying(1) | -| escapechar | character varying(1) | -| quotechar | character varying(1) | -| table\_imported\_to\_id | integer | -| user\_id | integer | -| type | character varying(128) | -| max\_level | integer | -| sheet\_index | integer | - -This stores metadata about files which have been uploaded for import into Mathesar. We should keep this model. `table_imported_to_id` should be removed (it's not used anywhere is it?). Also `max_level` seems like less of a data file attribute and more of an import setting. - -### PreviewColumnSettings - -| Column | Type | -|-------------|--------------------------| -| id | integer | -| created\_at | timestamp with time zone | -| updated\_at | timestamp with time zone | -| customized | boolean | -| template | character varying(255) | - -This stores the template defining what should be shown in a referencing fkey column for this table. This would be _much_ better as a `ma_catalog` table for efficiency reasons. - -In that case, a table's preview settings would be "global", i.e., it would be attached to the table rather than a user, table pair. - -Referenced by TableSettings. - -### Schema - -| Column | Type | -|--------------|--------------------------| -| id | integer | -| created\_at | timestamp with time zone | -| updated\_at | timestamp with time zone | -| oid | integer | -| database\_id | integer | - -Nothing stored here. - -Referenced by SchemaRole and Table models. - -Delete this model. All permissions handled by the referencing SchemaRole should instead be handled by the underlying user's permissions on the actual schema in the DB - -### SchemaRole - -| Column | Type | -|-------------|--------------------------| -| id | integer | -| created\_at | timestamp with time zone | -| updated\_at | timestamp with time zone | -| role | character varying(10) | -| schema\_id | integer | -| user\_id | integer | - -This should be deleted, and the permissions should be instead managed on the underlying DB. - -### SharedQuery - -| Column | Type | -|-------------|--------------------------| -| id | integer | -| created\_at | timestamp with time zone | -| updated\_at | timestamp with time zone | -| slug | uuid | -| enabled | boolean | -| query\_id | integer | - -This model should stay. No changes here. We need to add metadata about a credential for running the actual query. - -### SharedTable - -| Column | Type | -|-------------|--------------------------| -| id | integer | -| created\_at | timestamp with time zone | -| updated\_at | timestamp with time zone | -| slug | uuid | -| enabled | boolean | -| table\_id | integer | - -Only change is that we need to refer directly to a table OID, and handle permissions. - -### Table - -| Column | Type | -|--------------------|--------------------------| -| id | integer | -| created\_at | timestamp with time zone | -| updated\_at | timestamp with time zone | -| oid | integer | -| import\_verified | boolean | -| is\_temp | boolean | -| import\_target\_id | integer | -| schema\_id | integer | - -Stores info about: - -- whether the initial data import for the table has been manually verified by a user or not, and -- whether the table is actually a temporary holder for data intended for a preexisting table. - -Referenced by Column, Constraint, DataFile, SharedTable, Table, TableSettings, and UIQuery models - -We should combine this with the `TableSettings` model to create a `TableMetadata` model that just has that info, and drop all fkeys and references. - -### TableSettings - -| Column | Type | -|-----------------------|--------------------------| -| id | integer | -| created\_at | timestamp with time zone | -| updated\_at | timestamp with time zone | -| column\_order | jsonb | -| preview\_settings\_id | integer | -| table\_id | integer | - -This stores Mathesar-specific metadata about tables. Should be combined with remains of `Table` model. - -### UIQuery - -| Column | Type | -|------------------|--------------------------| -| id | integer | -| created\_at | timestamp with time zone | -| updated\_at | timestamp with time zone | -| name | character varying(128) | -| description | text | -| initial\_columns | jsonb | -| transformations | jsonb | -| display\_options | jsonb | -| display\_names | jsonb | -| base\_table\_id | integer | - -This stores a definition of a stored query that can be run on command. The main changes are that it should refer directly to DB-layer ids (oids and attnums) rather than Django-layer. - -### User - -| Column | Type | -|--------------------------|--------------------------| -| id | integer | -| password | character varying(128) | -| last\_login | timestamp with time zone | -| is\_superuser | boolean | -| username | character varying(150) | -| email | character varying(254) | -| is\_staff | boolean | -| is\_active | boolean | -| date\_joined | timestamp with time zone | -| full\_name | character varying(255) | -| short\_name | character varying(255) | -| password\_change\_needed | boolean | - -This stores user metadata. I think we should mostly keep it as is. It will be referenced by the `UserDatabaseRoleMap` model. - -## New Model setup - Some of this is tentative pending decisions in our Users and Permissions work. We should be able to handle anything being discussed through simple extensions of this model framework. Also, these models are intended to get us to beta, while providing flexibility to move forward afterwards. There will be discussion of a desired next iteration at the end. -### User +## User | Column | Type | Notes | |--------------------------|--------------------------|------------------| @@ -255,7 +19,7 @@ Some of this is tentative pending decisions in our Users and Permissions work. W | short\_name | character varying(255) | | | password\_change\_needed | boolean | | -### DatabaseServer +## DatabaseServer | Column | Type | Notes | |-------------|--------------------------|----------| @@ -271,7 +35,7 @@ Theoretically, we could also split the host out, but that seems like premature o We could consider making the `host` and `port` nullable when we're supporting `.pgpass`. -### Database +## Database | Column | Type | Notes | |---------------|--------------------------|-----------------------------------------| @@ -286,7 +50,7 @@ We could consider making the `host` and `port` nullable when we're supporting `. `(db_server, oid)` pair is unique, or `(db_server, db_name)` is. Should consider whether we should even store the `oid`, since we can't look up the `db_name` to connect without ... connecting, which requires the `db_name` rather than the `oid`. We could consider making `db_name` nullable when supporting `.pgpass` -### DatabaseServerCredential +## DatabaseServerCredential | Column | Type | Notes | |-------------|--------------------------|-----------------------------------------| @@ -299,7 +63,7 @@ We could consider making the `host` and `port` nullable when we're supporting `. We could consider making `username` and `password` nullable when supporting `.pgpass`. -### UserDatabaseRoleMap +## UserDatabaseRoleMap | Column | Type | Notes | |----------------|--------------------------|-----------------------------------------| @@ -313,13 +77,13 @@ We could consider making `username` and `password` nullable when supporting `.pg `(user, database)` pair is unique. The `metadata_role` isn't likely to be technically implemented as an `enum` on the DB for now. We'll use a Django-managed `TextChoices` field to save implementation time. See the current `DatabaseRole` model and its interaction with the `Role` class for an example. -### Aside: Quick overview of connecting to a DB. +## Aside: Quick overview of connecting to a DB. The Django permissions infrastructure should handle CRUD operations on `Database`, `DatabaseServerCredential`, `DatabaseServer`, and `UserDatabaseRoleMap` resources. Actually accessing a database wouldn't require the permissions infrastructure; we'd instead construct a connection string by joining the appropriate `database` to the other info found by looking up the `user, database` pair. For example, given a `(user, database)` pair like `(3, 8)`, we'd look up the appropriate row in the `UserDatabaseRoleMap` model to find the `role` (referencing `DatabaseServerCredential`). We also follow the foreign key to the `Database` to pick up the `db_name` and then the foreign key to `DatabaseServer` to pick up the `host` and `port`. We should eventually add functionality to store some details in a [`.pgpass`](https://www.postgresql.org/docs/current/libpq-pgpass.html) dotfile (though probably in a custom location). `psycopg` can inject the password automatically through these means. -### UIQuery +## UIQuery | Column | Type | Notes | |------------------|--------------------------|-------------------------| @@ -339,7 +103,7 @@ We should eventually add functionality to store some details in a [`.pgpass`](ht - We should consider changing `display_options` to refer to instances of `ColumnMetadata` within the JSONB - Permissions on this object will be derived from the `UserDatabaseRoleMap.metadata_role` via the `(database, user)` pair. -### ColumnMetadata +## ColumnMetadata | Column | Type | Notes | |-------------------------|--------------------------|-----------------------------------| @@ -370,7 +134,7 @@ We should eventually add functionality to store some details in a [`.pgpass`](ht - We don't need to reference any `schema_oid`, since a `(table_oid, attnum)` pair is unique per DB. - Permissions to manipulate instances of this model would be derived from permissions to manipulate the relevant table and column in the underlying database. -### TableMetadata +## TableMetadata | Column | Type | Notes | |---------------------|--------------------------|-----------------------------------| @@ -388,7 +152,7 @@ We should eventually add functionality to store some details in a [`.pgpass`](ht I've left the preview template in the Mathesar layer. The hope is that we can find a sufficiently featureful and also sufficiently efficient algorithm for getting the template, thereby avoiding needing to move this down into the user Database. There will be more discussion of this below. Permissions to manipulate this should be derived from permissions on the relevant table in the underlying database. -### DataFile +## DataFile | Column | Type | Notes | |---------------|--------------------------|----------| @@ -409,7 +173,7 @@ I've left the preview template in the Mathesar layer. The hope is that we can fi When we have our desired logic for cleaning this up sorted out, we should consider removing this model. It's currently only used ephemerally, but then the actuaul instance hangs around indefinitely. -### SharedQuery +## SharedQuery | Column | Type | Notes | |-------------|--------------------------|-----------------------------------------| @@ -423,7 +187,7 @@ When we have our desired logic for cleaning this up sorted out, we should consid I've chosen to store the credential id, rather than the creating user, for flexibility. We can derive this from a creating user at the time the query is created, and could (theoretically) update it if the User's credential for a given DB changes (I wouldn't recommend this). -### SharedTable +## SharedTable | Column | Type | Notes | |-------------|--------------------------|-----------------------------------------| @@ -441,7 +205,7 @@ For the beta, I'm hoping to avoid some work by keeping things in the Mathesar se I also think in the even longer term that we should think about storing our `UIQuery` info on the underlying database in the form of views (perhaps in a special `msar_queries` schema). This presents some technical problems, however, that we haven't yet solved. -### What about names vs. OIDs? +## What about names vs. OIDs? I thought about adding another model to store a general map of names to OIDs for use when resolving missing tables, etc. This would be useful if someone drops and recreates a table, or when trying to export your Mathesar Explorations or Display Settings. I didn't add that at this stage, since: diff --git a/docs/engineering/architecture/old_models.md b/docs/engineering/architecture/old_models.md new file mode 100644 index 000000000..a50821129 --- /dev/null +++ b/docs/engineering/architecture/old_models.md @@ -0,0 +1,233 @@ +# Deprecated Models + +This section contains ad-hoc notes on our current models, and intended changes. + +## Column + +| Column | Type | +|------------------|--------------------------| +| id | integer | +| created\_at | timestamp with time zone | +| updated\_at | timestamp with time zone | +| attnum | integer | +| display\_options | jsonb | +| table\_id | integer | + +The only actual info here is the display options for a given column, stored as a JSON blob. Rename to `ColumnMetadata`, restructure to validate display options, delete fkey fields. Consider moving to `ma_catalog` table on user DB. + +We need to handle updating the table preview template when a new column is added (or rethink the implementation of this functionality) + +We need to replace functionality to get `ui_type` from DB type. + +To replace the dependent-getting functionality, we need to move the dependents module to SQL. + +## Constraint + +| Column | Type | +|-------------|--------------------------| +| id | integer | +| created\_at | timestamp with time zone | +| updated\_at | timestamp with time zone | +| oid | integer | +| table\_id | integer | + +Nothing actually stored here. Delete this model. All functionality can be contained in User DB functions. + +## Database + +| Column | Type | +|-------------|--------------------------| +| id | integer | +| created\_at | timestamp with time zone | +| updated\_at | timestamp with time zone | +| name | character varying(128) | +| deleted | boolean | +| db\_name | character varying(128) | +| editable | boolean | +| host | character varying(255) | +| password | text | +| port | integer | +| username | text | + +Stores connection info to allow accessing a DB by creating an SQLAlchemy engine. + +Referenced by DatabaseRole and Schema models. + +Replace this with `Database`, `DatabaseServer`, `DatabaseServerCredential`, and `UserDatabaseRoleMap` models. See the [New models](./models.md) for details. + +## DatabaseRole + +| Column | Type | +|--------------|--------------------------| +| id | integer | +| created\_at | timestamp with time zone | +| updated\_at | timestamp with time zone | +| role | character varying(10) | +| database\_id | integer | +| user\_id | integer | + +This stores a role on a given database for a given user. We will repurpose this, and it will be applied (for now) only to `UIQuery` instances namespaced under a given database. + +## DataFile + +| Column | Type | +|-------------------------|--------------------------| +| id | integer | +| created\_at | timestamp with time zone | +| updated\_at | timestamp with time zone | +| file | character varying(100) | +| created\_from | character varying(128) | +| base\_name | character varying(100) | +| header | boolean | +| delimiter | character varying(1) | +| escapechar | character varying(1) | +| quotechar | character varying(1) | +| table\_imported\_to\_id | integer | +| user\_id | integer | +| type | character varying(128) | +| max\_level | integer | +| sheet\_index | integer | + +This stores metadata about files which have been uploaded for import into Mathesar. We should keep this model. `table_imported_to_id` should be removed (it's not used anywhere is it?). Also `max_level` seems like less of a data file attribute and more of an import setting. + +## PreviewColumnSettings + +| Column | Type | +|-------------|--------------------------| +| id | integer | +| created\_at | timestamp with time zone | +| updated\_at | timestamp with time zone | +| customized | boolean | +| template | character varying(255) | + +This stores the template defining what should be shown in a referencing fkey column for this table. This would be _much_ better as a `ma_catalog` table for efficiency reasons. + +In that case, a table's preview settings would be "global", i.e., it would be attached to the table rather than a user, table pair. + +Referenced by TableSettings. + +## Schema + +| Column | Type | +|--------------|--------------------------| +| id | integer | +| created\_at | timestamp with time zone | +| updated\_at | timestamp with time zone | +| oid | integer | +| database\_id | integer | + +Nothing stored here. + +Referenced by SchemaRole and Table models. + +Delete this model. All permissions handled by the referencing SchemaRole should instead be handled by the underlying user's permissions on the actual schema in the DB + +## SchemaRole + +| Column | Type | +|-------------|--------------------------| +| id | integer | +| created\_at | timestamp with time zone | +| updated\_at | timestamp with time zone | +| role | character varying(10) | +| schema\_id | integer | +| user\_id | integer | + +This should be deleted, and the permissions should be instead managed on the underlying DB. + +## SharedQuery + +| Column | Type | +|-------------|--------------------------| +| id | integer | +| created\_at | timestamp with time zone | +| updated\_at | timestamp with time zone | +| slug | uuid | +| enabled | boolean | +| query\_id | integer | + +This model should stay. No changes here. We need to add metadata about a credential for running the actual query. + +## SharedTable + +| Column | Type | +|-------------|--------------------------| +| id | integer | +| created\_at | timestamp with time zone | +| updated\_at | timestamp with time zone | +| slug | uuid | +| enabled | boolean | +| table\_id | integer | + +Only change is that we need to refer directly to a table OID, and handle permissions. + +## Table + +| Column | Type | +|--------------------|--------------------------| +| id | integer | +| created\_at | timestamp with time zone | +| updated\_at | timestamp with time zone | +| oid | integer | +| import\_verified | boolean | +| is\_temp | boolean | +| import\_target\_id | integer | +| schema\_id | integer | + +Stores info about: + +- whether the initial data import for the table has been manually verified by a user or not, and +- whether the table is actually a temporary holder for data intended for a preexisting table. + +Referenced by Column, Constraint, DataFile, SharedTable, Table, TableSettings, and UIQuery models + +We should combine this with the `TableSettings` model to create a `TableMetadata` model that just has that info, and drop all fkeys and references. + +## TableSettings + +| Column | Type | +|-----------------------|--------------------------| +| id | integer | +| created\_at | timestamp with time zone | +| updated\_at | timestamp with time zone | +| column\_order | jsonb | +| preview\_settings\_id | integer | +| table\_id | integer | + +This stores Mathesar-specific metadata about tables. Should be combined with remains of `Table` model. + +## UIQuery + +| Column | Type | +|------------------|--------------------------| +| id | integer | +| created\_at | timestamp with time zone | +| updated\_at | timestamp with time zone | +| name | character varying(128) | +| description | text | +| initial\_columns | jsonb | +| transformations | jsonb | +| display\_options | jsonb | +| display\_names | jsonb | +| base\_table\_id | integer | + +This stores a definition of a stored query that can be run on command. The main changes are that it should refer directly to DB-layer ids (oids and attnums) rather than Django-layer. + +## User + +| Column | Type | +|--------------------------|--------------------------| +| id | integer | +| password | character varying(128) | +| last\_login | timestamp with time zone | +| is\_superuser | boolean | +| username | character varying(150) | +| email | character varying(254) | +| is\_staff | boolean | +| is\_active | boolean | +| date\_joined | timestamp with time zone | +| full\_name | character varying(255) | +| short\_name | character varying(255) | +| password\_change\_needed | boolean | + +This stores user metadata. I think we should mostly keep it as is. It will be referenced by the `UserDatabaseRoleMap` model. From 7c134d1915d19b259bc6e4cfc272c33e657ccae4 Mon Sep 17 00:00:00 2001 From: Brent Moran Date: Wed, 27 Mar 2024 14:18:36 -0500 Subject: [PATCH 17/24] remove DB oid. We're not using it for anything. --- docs/engineering/architecture/models.md | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/docs/engineering/architecture/models.md b/docs/engineering/architecture/models.md index 435fa8e47..50e634ce5 100644 --- a/docs/engineering/architecture/models.md +++ b/docs/engineering/architecture/models.md @@ -1,6 +1,8 @@ # Models -Some of this is tentative pending decisions in our Users and Permissions work. We should be able to handle anything being discussed through simple extensions of this model framework. Also, these models are intended to get us to beta, while providing flexibility to move forward afterwards. There will be discussion of a desired next iteration at the end. +Subject to minor changes. + +We should be able to handle anything being discussed for beta through simple extensions of this model framework. Also, these models are intended to get us to beta, while providing flexibility to move forward afterwards. There will be a brief discussion of a desired next iteration at the end. ## User @@ -42,13 +44,12 @@ We could consider making the `host` and `port` nullable when we're supporting `. | id | integer | pkey | | created\_at | timestamp with time zone | | | updated\_at | timestamp with time zone | | -| oid | integer | not null | | db\_name | text | not null | | display\_name | text | not null; unique | | db\_server | integer | not null; references DatabaseServer(id) | | editable | boolean | | -`(db_server, oid)` pair is unique, or `(db_server, db_name)` is. Should consider whether we should even store the `oid`, since we can't look up the `db_name` to connect without ... connecting, which requires the `db_name` rather than the `oid`. We could consider making `db_name` nullable when supporting `.pgpass` +`(db_server, db_name)` is unique. We could consider making `db_name` nullable when supporting `.pgpass`. ## DatabaseServerCredential @@ -81,7 +82,7 @@ We could consider making `username` and `password` nullable when supporting `.pg The Django permissions infrastructure should handle CRUD operations on `Database`, `DatabaseServerCredential`, `DatabaseServer`, and `UserDatabaseRoleMap` resources. Actually accessing a database wouldn't require the permissions infrastructure; we'd instead construct a connection string by joining the appropriate `database` to the other info found by looking up the `user, database` pair. For example, given a `(user, database)` pair like `(3, 8)`, we'd look up the appropriate row in the `UserDatabaseRoleMap` model to find the `role` (referencing `DatabaseServerCredential`). We also follow the foreign key to the `Database` to pick up the `db_name` and then the foreign key to `DatabaseServer` to pick up the `host` and `port`. -We should eventually add functionality to store some details in a [`.pgpass`](https://www.postgresql.org/docs/current/libpq-pgpass.html) dotfile (though probably in a custom location). `psycopg` can inject the password automatically through these means. +We should eventually add functionality to store some details in a [`.pgpass`](https://www.postgresql.org/docs/current/libpq-pgpass.html) dotfile (though probably in a custom location). `psycopg` can inject the password and/or other missing pieces automatically through these means. ## UIQuery From 5cdf75ac06732cb852f731a140b242a4df60b33c Mon Sep 17 00:00:00 2001 From: Brent Moran Date: Wed, 27 Mar 2024 14:52:05 -0500 Subject: [PATCH 18/24] add default credential field to model --- docs/engineering/architecture/models.md | 23 ++++++++++++----------- 1 file changed, 12 insertions(+), 11 deletions(-) diff --git a/docs/engineering/architecture/models.md b/docs/engineering/architecture/models.md index 50e634ce5..d4254012d 100644 --- a/docs/engineering/architecture/models.md +++ b/docs/engineering/architecture/models.md @@ -39,17 +39,18 @@ We could consider making the `host` and `port` nullable when we're supporting `. ## Database -| Column | Type | Notes | -|---------------|--------------------------|-----------------------------------------| -| id | integer | pkey | -| created\_at | timestamp with time zone | | -| updated\_at | timestamp with time zone | | -| db\_name | text | not null | -| display\_name | text | not null; unique | -| db\_server | integer | not null; references DatabaseServer(id) | -| editable | boolean | | - -`(db_server, db_name)` is unique. We could consider making `db_name` nullable when supporting `.pgpass`. +| Column | Type | Notes | +|---------------------|--------------------------|---------------------------------------------------| +| id | integer | pkey | +| created\_at | timestamp with time zone | | +| updated\_at | timestamp with time zone | | +| db\_name | text | not null | +| display\_name | text | not null; unique | +| db\_server | integer | not null; references DatabaseServer(id) | +| editable | boolean | | +| default\_credential | integer | not null; references DatabaseServerCredential(id) | + +`(db_server, db_name)` is unique. We could consider making `db_name` nullable when supporting `.pgpass`. If a Mathesar Admin user doesn't have an entry in `UserDatabaseRoleMap` for a given database, they will use the `default_credential` defined here to connect. ## DatabaseServerCredential From e4d14beb33081d6a7b8c62460904f40e936e82ac Mon Sep 17 00:00:00 2001 From: Brent Moran Date: Wed, 27 Mar 2024 14:54:17 -0500 Subject: [PATCH 19/24] Add not-null constraints for defining UserDatabaseRoleMap entry --- docs/engineering/architecture/models.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/engineering/architecture/models.md b/docs/engineering/architecture/models.md index d4254012d..edbcfae40 100644 --- a/docs/engineering/architecture/models.md +++ b/docs/engineering/architecture/models.md @@ -72,8 +72,8 @@ We could consider making `username` and `password` nullable when supporting `.pg | id | integer | pkey | | created\_at | timestamp with time zone | | | updated\_at | timestamp with time zone | | -| user | integer | references User(id) | -| database | integer | references Database(id) | +| user | integer | not null; references User(id) | +| database | integer | not null; references Database(id) | | database\_role | integer | references DatabaseServerCredential(id) | | metadata\_role | enum | ('read only', 'read write') | From 7c63e968fd299309b668f76d7128844a33612ebc Mon Sep 17 00:00:00 2001 From: Brent Moran Date: Wed, 27 Mar 2024 14:57:49 -0500 Subject: [PATCH 20/24] add logic to handle not-nulls in the Database model --- docs/engineering/architecture/models.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/engineering/architecture/models.md b/docs/engineering/architecture/models.md index edbcfae40..3e99d1dd0 100644 --- a/docs/engineering/architecture/models.md +++ b/docs/engineering/architecture/models.md @@ -81,7 +81,7 @@ We could consider making `username` and `password` nullable when supporting `.pg ## Aside: Quick overview of connecting to a DB. -The Django permissions infrastructure should handle CRUD operations on `Database`, `DatabaseServerCredential`, `DatabaseServer`, and `UserDatabaseRoleMap` resources. Actually accessing a database wouldn't require the permissions infrastructure; we'd instead construct a connection string by joining the appropriate `database` to the other info found by looking up the `user, database` pair. For example, given a `(user, database)` pair like `(3, 8)`, we'd look up the appropriate row in the `UserDatabaseRoleMap` model to find the `role` (referencing `DatabaseServerCredential`). We also follow the foreign key to the `Database` to pick up the `db_name` and then the foreign key to `DatabaseServer` to pick up the `host` and `port`. +The Django permissions infrastructure should handle CRUD operations on `Database`, `DatabaseServerCredential`, `DatabaseServer`, and `UserDatabaseRoleMap` resources. When adding a `Database` for the first time, we'll also add a `DatabaseServer` if one doesn't exist, and add or choose a `DatabaseServerCredential` to be the default based on the credential provided when adding the `Database` entry. Actually accessing a database wouldn't require the permissions infrastructure; we'd instead construct a connection string by joining the appropriate `database` to the other info found by looking up the `user, database` pair. For example, given a `(user, database)` pair like `(3, 8)`, we'd look up the appropriate row in the `UserDatabaseRoleMap` model to find the `role` (referencing `DatabaseServerCredential`). We also follow the foreign key to the `Database` to pick up the `db_name` and then the foreign key to `DatabaseServer` to pick up the `host` and `port`. We should eventually add functionality to store some details in a [`.pgpass`](https://www.postgresql.org/docs/current/libpq-pgpass.html) dotfile (though probably in a custom location). `psycopg` can inject the password and/or other missing pieces automatically through these means. From 02e0451b189eb9e484b75bb4dc71336db4674297 Mon Sep 17 00:00:00 2001 From: Brent Moran Date: Wed, 27 Mar 2024 15:12:48 -0500 Subject: [PATCH 21/24] use DB for brevity; fkeys are table names --- docs/engineering/architecture/models.md | 108 ++++++++++++------------ 1 file changed, 54 insertions(+), 54 deletions(-) diff --git a/docs/engineering/architecture/models.md b/docs/engineering/architecture/models.md index 3e99d1dd0..6cc66226a 100644 --- a/docs/engineering/architecture/models.md +++ b/docs/engineering/architecture/models.md @@ -21,7 +21,7 @@ We should be able to handle anything being discussed for beta through simple ext | short\_name | character varying(255) | | | password\_change\_needed | boolean | | -## DatabaseServer +## DBServer | Column | Type | Notes | |-------------|--------------------------|----------| @@ -39,49 +39,49 @@ We could consider making the `host` and `port` nullable when we're supporting `. ## Database -| Column | Type | Notes | -|---------------------|--------------------------|---------------------------------------------------| -| id | integer | pkey | -| created\_at | timestamp with time zone | | -| updated\_at | timestamp with time zone | | -| db\_name | text | not null | -| display\_name | text | not null; unique | -| db\_server | integer | not null; references DatabaseServer(id) | -| editable | boolean | | -| default\_credential | integer | not null; references DatabaseServerCredential(id) | - -`(db_server, db_name)` is unique. We could consider making `db_name` nullable when supporting `.pgpass`. If a Mathesar Admin user doesn't have an entry in `UserDatabaseRoleMap` for a given database, they will use the `default_credential` defined here to connect. - -## DatabaseServerCredential - -| Column | Type | Notes | -|-------------|--------------------------|-----------------------------------------| -| id | integer | pkey | -| created\_at | timestamp with time zone | | -| updated\_at | timestamp with time zone | | -| username | character varying | not null | -| password | character varying | encrypted; not null | -| db\_server | integer | not null; references DatabaseServer(id) | +| Column | Type | Notes | +|---------------------------------|--------------------------|---------------------------------------------| +| id | integer | pkey | +| created\_at | timestamp with time zone | | +| updated\_at | timestamp with time zone | | +| db\_name | text | not null | +| display\_name | text | not null; unique | +| db\_server | integer | not null; references DBServer(id) | +| editable | boolean | | +| default\_db\_server\_credential | integer | not null; references DBServerCredential(id) | + +`(db_server, db_name)` is unique. We could consider making `db_name` nullable when supporting `.pgpass`. If a Mathesar Admin user doesn't have an entry in `UserDBRoleMap` for a given database, they will use the `default_credential` defined here to connect. + +## DBServerCredential + +| Column | Type | Notes | +|-------------|--------------------------|-----------------------------------| +| id | integer | pkey | +| created\_at | timestamp with time zone | | +| updated\_at | timestamp with time zone | | +| username | character varying | not null | +| password | character varying | encrypted; not null | +| db\_server | integer | not null; references DBServer(id) | We could consider making `username` and `password` nullable when supporting `.pgpass`. -## UserDatabaseRoleMap +## UserDBRoleMap -| Column | Type | Notes | -|----------------|--------------------------|-----------------------------------------| -| id | integer | pkey | -| created\_at | timestamp with time zone | | -| updated\_at | timestamp with time zone | | -| user | integer | not null; references User(id) | -| database | integer | not null; references Database(id) | -| database\_role | integer | references DatabaseServerCredential(id) | -| metadata\_role | enum | ('read only', 'read write') | +| Column | Type | Notes | +|-----------------------|--------------------------|-----------------------------------| +| id | integer | pkey | +| created\_at | timestamp with time zone | | +| updated\_at | timestamp with time zone | | +| user | integer | not null; references User(id) | +| database | integer | not null; references Database(id) | +| db\_server_credential | integer | references DBServerCredential(id) | +| metadata\_role | enum | ('read only', 'read write') | `(user, database)` pair is unique. The `metadata_role` isn't likely to be technically implemented as an `enum` on the DB for now. We'll use a Django-managed `TextChoices` field to save implementation time. See the current `DatabaseRole` model and its interaction with the `Role` class for an example. ## Aside: Quick overview of connecting to a DB. -The Django permissions infrastructure should handle CRUD operations on `Database`, `DatabaseServerCredential`, `DatabaseServer`, and `UserDatabaseRoleMap` resources. When adding a `Database` for the first time, we'll also add a `DatabaseServer` if one doesn't exist, and add or choose a `DatabaseServerCredential` to be the default based on the credential provided when adding the `Database` entry. Actually accessing a database wouldn't require the permissions infrastructure; we'd instead construct a connection string by joining the appropriate `database` to the other info found by looking up the `user, database` pair. For example, given a `(user, database)` pair like `(3, 8)`, we'd look up the appropriate row in the `UserDatabaseRoleMap` model to find the `role` (referencing `DatabaseServerCredential`). We also follow the foreign key to the `Database` to pick up the `db_name` and then the foreign key to `DatabaseServer` to pick up the `host` and `port`. +The Django permissions infrastructure should handle CRUD operations on `Database`, `DBServerCredential`, `DBServer`, and `UserDBRoleMap` resources. When adding a `Database` for the first time, we'll also add a `DBServer` if one doesn't exist, and add or choose a `DBServerCredential` to be the default based on the credential provided when adding the `Database` entry. Actually accessing a database wouldn't require the permissions infrastructure; we'd instead construct a connection string by joining the appropriate `database` to the other info found by looking up the `user, database` pair. For example, given a `(user, database)` pair like `(3, 8)`, we'd look up the appropriate row in the `UserDBRoleMap` model to find the `role` (referencing `DBServerCredential`). We also follow the foreign key to the `Database` to pick up the `db_name` and then the foreign key to `DBServer` to pick up the `host` and `port`. We should eventually add functionality to store some details in a [`.pgpass`](https://www.postgresql.org/docs/current/libpq-pgpass.html) dotfile (though probably in a custom location). `psycopg` can inject the password and/or other missing pieces automatically through these means. @@ -103,7 +103,7 @@ We should eventually add functionality to store some details in a [`.pgpass`](ht - The JSONB columns are the same format, except now they refer to DB-layer ids, e.g., OIDs and attnums rather than Django-layer IDs. - We should consider changing `display_options` to refer to instances of `ColumnMetadata` within the JSONB -- Permissions on this object will be derived from the `UserDatabaseRoleMap.metadata_role` via the `(database, user)` pair. +- Permissions on this object will be derived from the `UserDBRoleMap.metadata_role` via the `(database, user)` pair. ## ColumnMetadata @@ -177,29 +177,29 @@ When we have our desired logic for cleaning this up sorted out, we should consid ## SharedQuery -| Column | Type | Notes | -|-------------|--------------------------|-----------------------------------------| -| id | integer | pkey | -| created\_at | timestamp with time zone | | -| updated\_at | timestamp with time zone | | -| slug | uuid | unique | -| enabled | boolean | | -| query | integer | not null; references UIQuery(id) | -| credential | integer | references DatabaseServerCredential(id) | +| Column | Type | Notes | +|-------------|--------------------------|-----------------------------------| +| id | integer | pkey | +| created\_at | timestamp with time zone | | +| updated\_at | timestamp with time zone | | +| slug | uuid | unique | +| enabled | boolean | | +| query | integer | not null; references UIQuery(id) | +| credential | integer | references DBServerCredential(id) | I've chosen to store the credential id, rather than the creating user, for flexibility. We can derive this from a creating user at the time the query is created, and could (theoretically) update it if the User's credential for a given DB changes (I wouldn't recommend this). ## SharedTable -| Column | Type | Notes | -|-------------|--------------------------|-----------------------------------------| -| id | integer | pkey | -| created\_at | timestamp with time zone | | -| updated\_at | timestamp with time zone | | -| slug | uuid | unique | -| enabled | boolean | | -| table\_oid | integer | not null | -| credential | integer | references DatabaseServerCredential(id) | +| Column | Type | Notes | +|-------------|--------------------------|-----------------------------------| +| id | integer | pkey | +| created\_at | timestamp with time zone | | +| updated\_at | timestamp with time zone | | +| slug | uuid | unique | +| enabled | boolean | | +| table\_oid | integer | not null | +| credential | integer | references DBServerCredential(id) | ## After-beta-term vision From 02f2ef410e5c17db751997a48303321825cfd1f4 Mon Sep 17 00:00:00 2001 From: Brent Moran Date: Wed, 27 Mar 2024 15:16:00 -0500 Subject: [PATCH 22/24] change forgotten fkey attributes to match target table --- docs/engineering/architecture/models.md | 40 ++++++++++++------------- 1 file changed, 20 insertions(+), 20 deletions(-) diff --git a/docs/engineering/architecture/models.md b/docs/engineering/architecture/models.md index 6cc66226a..405a92637 100644 --- a/docs/engineering/architecture/models.md +++ b/docs/engineering/architecture/models.md @@ -177,29 +177,29 @@ When we have our desired logic for cleaning this up sorted out, we should consid ## SharedQuery -| Column | Type | Notes | -|-------------|--------------------------|-----------------------------------| -| id | integer | pkey | -| created\_at | timestamp with time zone | | -| updated\_at | timestamp with time zone | | -| slug | uuid | unique | -| enabled | boolean | | -| query | integer | not null; references UIQuery(id) | -| credential | integer | references DBServerCredential(id) | - -I've chosen to store the credential id, rather than the creating user, for flexibility. We can derive this from a creating user at the time the query is created, and could (theoretically) update it if the User's credential for a given DB changes (I wouldn't recommend this). +| Column | Type | Notes | +|------------------------|--------------------------|-----------------------------------| +| id | integer | pkey | +| created\_at | timestamp with time zone | | +| updated\_at | timestamp with time zone | | +| slug | uuid | unique | +| enabled | boolean | | +| query | integer | not null; references UIQuery(id) | +| db\_server\_credential | integer | references DBServerCredential(id) | + +I've chosen to store the `db_server_credential` id, rather than the creating user, for flexibility. We can derive this from a creating user at the time the query is created, and could (theoretically) update it if the User's credential for a given DB changes (I wouldn't recommend this). ## SharedTable -| Column | Type | Notes | -|-------------|--------------------------|-----------------------------------| -| id | integer | pkey | -| created\_at | timestamp with time zone | | -| updated\_at | timestamp with time zone | | -| slug | uuid | unique | -| enabled | boolean | | -| table\_oid | integer | not null | -| credential | integer | references DBServerCredential(id) | +| Column | Type | Notes | +|------------------------|--------------------------|-----------------------------------| +| id | integer | pkey | +| created\_at | timestamp with time zone | | +| updated\_at | timestamp with time zone | | +| slug | uuid | unique | +| enabled | boolean | | +| table\_oid | integer | not null | +| db\_server\_credential | integer | references DBServerCredential(id) | ## After-beta-term vision From e1ece7a3de5568d1178c9739a9418fff139471b1 Mon Sep 17 00:00:00 2001 From: Brent Moran Date: Wed, 27 Mar 2024 15:18:43 -0500 Subject: [PATCH 23/24] Use Exploration instead of UIQuery --- docs/engineering/architecture/models.md | 26 ++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/docs/engineering/architecture/models.md b/docs/engineering/architecture/models.md index 405a92637..a611e17da 100644 --- a/docs/engineering/architecture/models.md +++ b/docs/engineering/architecture/models.md @@ -85,7 +85,7 @@ The Django permissions infrastructure should handle CRUD operations on `Database We should eventually add functionality to store some details in a [`.pgpass`](https://www.postgresql.org/docs/current/libpq-pgpass.html) dotfile (though probably in a custom location). `psycopg` can inject the password and/or other missing pieces automatically through these means. -## UIQuery +## Exploration | Column | Type | Notes | |------------------|--------------------------|-------------------------| @@ -175,19 +175,19 @@ I've left the preview template in the Mathesar layer. The hope is that we can fi When we have our desired logic for cleaning this up sorted out, we should consider removing this model. It's currently only used ephemerally, but then the actuaul instance hangs around indefinitely. -## SharedQuery +## SharedExploration -| Column | Type | Notes | -|------------------------|--------------------------|-----------------------------------| -| id | integer | pkey | -| created\_at | timestamp with time zone | | -| updated\_at | timestamp with time zone | | -| slug | uuid | unique | -| enabled | boolean | | -| query | integer | not null; references UIQuery(id) | -| db\_server\_credential | integer | references DBServerCredential(id) | +| Column | Type | Notes | +|------------------------|--------------------------|--------------------------------------| +| id | integer | pkey | +| created\_at | timestamp with time zone | | +| updated\_at | timestamp with time zone | | +| slug | uuid | unique | +| enabled | boolean | | +| exploration | integer | not null; references Exploration(id) | +| db\_server\_credential | integer | references DBServerCredential(id) | -I've chosen to store the `db_server_credential` id, rather than the creating user, for flexibility. We can derive this from a creating user at the time the query is created, and could (theoretically) update it if the User's credential for a given DB changes (I wouldn't recommend this). +I've chosen to store the `db_server_credential` id, rather than the creating user, for flexibility. We can derive this from a creating user at the time the Exploration is created, and could (theoretically) update it if the User's credential for a given DB changes (I wouldn't recommend this). ## SharedTable @@ -205,7 +205,7 @@ I've chosen to store the `db_server_credential` id, rather than the creating use For the beta, I'm hoping to avoid some work by keeping things in the Mathesar service models that I'd rather store in the underlying User Databases in a `msar_catalog` schema. The relevant models are `ColumnMetadata` and `TableMetadata`. A big motivation to move this info to the User DB is performance w.r.t. the table previews. Our current algorithm requires lots of back-and-forth between the service layer and the User DB in order to recursively build these preview templates, and to fill them. I also think it's more natural to keep these metadata models in the User DB, since they're segregated by User DB, and each instance only refers to objects on that underlying database. -I also think in the even longer term that we should think about storing our `UIQuery` info on the underlying database in the form of views (perhaps in a special `msar_queries` schema). This presents some technical problems, however, that we haven't yet solved. +I also think in the even longer term that we should think about storing our `Exploration` info on the underlying database in the form of views (perhaps in a special `msar_queries` schema). This presents some technical problems, however, that we haven't yet solved. ## What about names vs. OIDs? From 96e94e5f5ce9af90cb8f55da60dce9b5d9536134 Mon Sep 17 00:00:00 2001 From: Brent Moran Date: Mon, 1 Apr 2024 16:19:01 -0500 Subject: [PATCH 24/24] Update permissions doc to match new models --- docs/engineering/architecture/models.md | 2 +- docs/engineering/architecture/permissions.md | 26 +++++++++++--------- 2 files changed, 15 insertions(+), 13 deletions(-) diff --git a/docs/engineering/architecture/models.md b/docs/engineering/architecture/models.md index a611e17da..7017b8fd6 100644 --- a/docs/engineering/architecture/models.md +++ b/docs/engineering/architecture/models.md @@ -81,7 +81,7 @@ We could consider making `username` and `password` nullable when supporting `.pg ## Aside: Quick overview of connecting to a DB. -The Django permissions infrastructure should handle CRUD operations on `Database`, `DBServerCredential`, `DBServer`, and `UserDBRoleMap` resources. When adding a `Database` for the first time, we'll also add a `DBServer` if one doesn't exist, and add or choose a `DBServerCredential` to be the default based on the credential provided when adding the `Database` entry. Actually accessing a database wouldn't require the permissions infrastructure; we'd instead construct a connection string by joining the appropriate `database` to the other info found by looking up the `user, database` pair. For example, given a `(user, database)` pair like `(3, 8)`, we'd look up the appropriate row in the `UserDBRoleMap` model to find the `role` (referencing `DBServerCredential`). We also follow the foreign key to the `Database` to pick up the `db_name` and then the foreign key to `DBServer` to pick up the `host` and `port`. +The Django permissions infrastructure should handle CRUD operations on `Database`, `DBServerCredential`, `DBServer`, and `UserDBRoleMap` resources. When adding a `Database` for the first time, we'll also add a `DBServer` if one doesn't exist, and add or choose a `DBServerCredential` to be the default based on the credential provided when adding the `Database` entry. Actually accessing a database wouldn't require the permissions infrastructure; we'd instead construct a connection string by joining the appropriate `database` to the other info found by looking up the `user, database` pair. For example, given a `(user, database)` pair like `(3, 8)`, we'd look up the appropriate row in the `UserDBRoleMap` model to find the `db_server_credential` (referencing `DBServerCredential`). We also follow the foreign key to the `Database` to pick up the `db_name` and then the foreign key to `DBServer` to pick up the `host` and `port`. We should eventually add functionality to store some details in a [`.pgpass`](https://www.postgresql.org/docs/current/libpq-pgpass.html) dotfile (though probably in a custom location). `psycopg` can inject the password and/or other missing pieces automatically through these means. diff --git a/docs/engineering/architecture/permissions.md b/docs/engineering/architecture/permissions.md index 87fbfaeb0..232c925f9 100644 --- a/docs/engineering/architecture/permissions.md +++ b/docs/engineering/architecture/permissions.md @@ -14,13 +14,13 @@ This model holds the `host` and `port` information for a given database. As described in [the models section](./models.md), This will store the authentication information needed to create an engine, but won't provide a database (a required part of a connection definition in PostgreSQL). Because the information includes a role (to define the initial connection role), it necessarily defines a set of privileges available on the database with that connection. -## User Database Map Model +## User Database Role Map Model -This map uses a `user_id, database` pair to look up the needed credentials, then provides a connection to the database using those credentials (if possible). +This map uses a `user, database` pair to look up the needed credentials, then provides a connection to the database using those credentials (if possible). ## Adding Connections (backend perspective) -Regardless of UI, the backend should receive a `POST` request to the new RPC endpoint defining a new connection. We'll use the same functions set up in [PR \#3348](https://github.com/mathesar-foundation/mathesar/pull/3348). The relevant info should be stored in the `Database`, `DatabaseServer`, and `DatabaseServerCredential` models. Also, if the user does not already have an entry defining their role on the given database, we could create such an entry automatically in the `UserDatabaseMap` model. This is optional, and doesn't really affect the architecture. Then, to let some other users access that connection, we will provide an RPC function that lets an admin set different users' credentials for the given database by creating or updating `UserDatabaseMap` resources. Note that this does _not_ directly modify anything to do with permissions on actual database objects (e.g., schemata or tables). +Regardless of UI, the backend should receive a `POST` request to the new RPC endpoint defining a new connection. We'll use the same functions set up in [PR \#3348](https://github.com/mathesar-foundation/mathesar/pull/3348). The relevant info should be stored in the `Database`, `DatabaseServer`, and `DatabaseServerCredential` models. Also, if the user does not already have an entry defining their role on the given database, we could create such an entry automatically in the `UserDBRoleMap` model. This is optional, and doesn't really affect the architecture. Then, to let some other users access that connection, we will provide an RPC function that lets an admin set different users' credentials for the given database by creating or updating `UserDBRoleMap` resources. Note that this does _not_ directly modify anything to do with permissions on actual database objects (e.g., schemata or tables). ## Granting database object privileges @@ -28,7 +28,7 @@ The backend will provide RPC functions that let an admin (who has access to a su - Privileges on DB objects are thus not granted directly to Mathesar users. - Privileges on DB objects are granted to DB roles, which may be accessible to some (or all) Mathesar users. -- Anytime an RPC function requiring DB access runs, it runs using the connection defined via the `UserDatabaseMap`, with the associated privileges on that database. +- Anytime an RPC function requiring DB access runs, it runs using the connection defined via the `UserDBRoleMap`, with the associated privileges on that database. !!! question "UX Question" Should the admin think in terms of groups of Mathesar users, or specifically in terms of connections when dealing with DB-level privileges? @@ -39,17 +39,19 @@ The backend will provide RPC functions that let an admin (who has access to a su - `GRANT` the owning DB role (ostensibly a user) to DB users connectable by the relevant Mathesar users. - Have at least one DB superuser available for use with a connection, and give relevant Mathesar users access to that DB superuser. -## Metadata object privileges +## Mathesar object privileges -Examples of these are table properties like preview columns. Permissions for these are derived from permissions on relevant User DB objects. So, if a user wants to modify the preview template for a table, they would be checked against the permission to modify the underlying table structure. Thus, modifying this metadata will occur in a request to modify the associated table, and the privilege check will be whether the relevant DB user is allowed to `ALTER` that table. - -## Standalone Mathesar objects - -An example of this would be an Exploration. The privileges needed to actually _run_ the query are handled on the database as described above, but a user should have access to look at the exploration definition itself (as well as edit/delete) handled by the Django access policy framework. It seems like our current Manager/Editor/Viewer framework should suffice. +Examples of such objects are Explorations, and table properties like preview columns or display options. Permissions on these will be: +- None, +- read only, or +- read write +These permissions will be tracked in the `UserDBRoleMap` model via the `metadata_role` attribute. Note that these permissions are only related to CRUD operations on these objects. In the case of Explorations, actually _running_ the exploration is dependent on: +- at least the read permission on the object, and +- the database-level permissions associated with the connection available to the user. ### Current plan -In the case of Explorations (`UIQuery` model), this will be derived from a policy scoped via the `Database` associated with the Exploration: +In the case of Explorations (`Exploration` model), this will be derived from a policy scoped via the `Database` associated with the Exploration: - A super user of Mathesar will set the policy for a given `User` on a given `Database` instance via the UI. - When a `User` wants to view/edit/manage an Exploration, the web service will check the `user, database` pair (where `database`) is the `Database` associated with the Exploration to find a relevant policy. @@ -57,7 +59,7 @@ In the case of Explorations (`UIQuery` model), this will be derived from a polic ### Alternative plan -In the case of Explorations (`UIQuery` model), this will be derived from a policy scoped via the `Database` associated with the Exploration: +In the case of Explorations (`Exploration` model), this will be derived from a policy scoped via the `Database` associated with the Exploration: - A super user of Mathesar will set the policy for `DatabaseServerCredential` instances via the UI. - Each credential instance represents a Role on the DB.