From 3dc7b50e31f2d7dc7f284e3d720120b03685cbaa Mon Sep 17 00:00:00 2001 From: Ravi Meijer Date: Fri, 18 Oct 2024 14:01:33 +0200 Subject: [PATCH 1/5] Add ADR on system card storage --- .../amt/adrs/0008-systemcard-storage.md | 54 +++++++++++++++++++ 1 file changed, 54 insertions(+) create mode 100644 docs/projects/amt/adrs/0008-systemcard-storage.md diff --git a/docs/projects/amt/adrs/0008-systemcard-storage.md b/docs/projects/amt/adrs/0008-systemcard-storage.md new file mode 100644 index 00000000..1f9ec7f6 --- /dev/null +++ b/docs/projects/amt/adrs/0008-systemcard-storage.md @@ -0,0 +1,54 @@ +# AMT-0008 System card Storage + +## Context + +By default, Kubernetes pods use ephemeral storage for their containers. This storage is tied to the +lifecycle of the pod. Thus, when the pod terminates or restarts, the data is lost. +The /tmp/ directory is part of the system's temporary file storage. Files stored here are wiped out +upon system reboots or restarts of pods. This leads to the deletion of the system_cards, which would +need to be recreated or fetched again. + +## Assumptions + +* The system card data is small to moderate in size and fits within the capabilities of Postgres' JSONB storage. +* Tracking changes to the system card data over time is not a priority in the short term, but may become +necessary in the future. +* Fast access and transactional integrity are critical requirements for the system. + +## Decision + +A Algorithm Systems system card is stored solely as a JSONB blob in the projects table in Postgres, +with no additional storage elsewhere. + +## Risks + +* **Data Overwrite**: As the system card is overwritten with each update, it becomes difficult to track +historical changes or revert to previous states. +* **Scaling**: As the project grows, managing larger JSONB blobs may present performance challenges, +particularly when handling complex queries. +* **Collaboration**: Collaborating on the system card content is more difficult, as the JSONB format +requires parsing and manual intervention for certain tasks. +* **Limited Querying**: While Postgres supports querying JSONB, complex queries and data manipulations +may be inefficient without proper indexing or further optimization. + +## Consequences + +### Positive + +* **Fast implementation**: The solution is easy to set up, reducing the time to get the project operational. +* **Future proof**: This approach is designed with future scalability in mind. While system cards are initially +stored in Postgres as JSONB blobs, the plan to transition to a persistent volume with versioned YAML files +managed through Git ensures easy adaptation. The future migration to a remote Git-based storage system offers +enhanced version control, collaboration, and auditing, with minimal disruption to existing workflows. +* **Fast access**: Storing the data in Postgres ensures fast access, as everything is contained within a single source. +* **Single source**: Keeping everything in one database simplifies backups and maintenance. +* **Built-in permissions**: Postgres provides built-in access control and security through its permission system. + +### Negative + +* **Data tracking**: Changes to the system card are overwritten, making it difficult to maintain a history or audit trail. +* **Complex queries**: Complex queries, especially those involving nested data or formulas in the JSONB blob, +can be inefficient and require custom parsing. +* **Collaboration**: Team members face difficulties working collaboratively on the JSONB data due to its format and +lack of versioning. +* **Scalability**: As the JSONB blobs grow in size, the storage overhead and query performance may become significant issues. From 57f0507877add884d5aea7b4c186d90ef9487a1d Mon Sep 17 00:00:00 2001 From: Ravi Meijer Date: Fri, 18 Oct 2024 15:37:46 +0200 Subject: [PATCH 2/5] Add links and ambition --- docs/index.md | 3 +- .../amt/adrs/0008-systemcard-storage.md | 30 ++++++++----------- docs/projects/amt/index.md | 18 +++++------ 3 files changed, 24 insertions(+), 27 deletions(-) diff --git a/docs/index.md b/docs/index.md index 1c27b96d..99d150f4 100644 --- a/docs/index.md +++ b/docs/index.md @@ -17,9 +17,10 @@ We work on the following projects within the _Transparency of Algorithmic Decisi graph TB ak[Algoritmekader] <--> amt - subgraph amt[Algorithm Management Toolkit] + subgraph amt[Algorithm Management Toolkit] tr[Task Registry] --> amp[Algorithm Management Platform] st[Reporting Standard] --> amp + amp <--> ai_act_decision_tree[AI Act Decision Tree] amp <--> llm[LLM Benchmark Tooling] end diff --git a/docs/projects/amt/adrs/0008-systemcard-storage.md b/docs/projects/amt/adrs/0008-systemcard-storage.md index 1f9ec7f6..48145c18 100644 --- a/docs/projects/amt/adrs/0008-systemcard-storage.md +++ b/docs/projects/amt/adrs/0008-systemcard-storage.md @@ -2,22 +2,20 @@ ## Context -By default, Kubernetes pods use ephemeral storage for their containers. This storage is tied to the -lifecycle of the pod. Thus, when the pod terminates or restarts, the data is lost. -The /tmp/ directory is part of the system's temporary file storage. Files stored here are wiped out -upon system reboots or restarts of pods. This leads to the deletion of the system_cards, which would -need to be recreated or fetched again. +By default, Kubernetes pods use ephemeral storage, which is tied to the pod's lifecycle. +When the pod terminates or restarts, all data is lost. The `/tmp/` directory, being part of +the system's temporary file storage, is cleared during reboots or pod restarts, resulting in +the deletion of system_cards. Therefore, we need a different kind of storage to preserve the data. ## Assumptions * The system card data is small to moderate in size and fits within the capabilities of Postgres' JSONB storage. * Tracking changes to the system card data over time is not a priority in the short term, but may become necessary in the future. -* Fast access and transactional integrity are critical requirements for the system. ## Decision -A Algorithm Systems system card is stored solely as a JSONB blob in the projects table in Postgres, +The system card of an algorithm system is stored solely as a JSONB blob in the projects table in Postgres, with no additional storage elsewhere. ## Risks @@ -36,19 +34,17 @@ may be inefficient without proper indexing or further optimization. ### Positive * **Fast implementation**: The solution is easy to set up, reducing the time to get the project operational. -* **Future proof**: This approach is designed with future scalability in mind. While system cards are initially -stored in Postgres as JSONB blobs, the plan to transition to a persistent volume with versioned YAML files -managed through Git ensures easy adaptation. The future migration to a remote Git-based storage system offers -enhanced version control, collaboration, and auditing, with minimal disruption to existing workflows. -* **Fast access**: Storing the data in Postgres ensures fast access, as everything is contained within a single source. -* **Single source**: Keeping everything in one database simplifies backups and maintenance. +* **Future proof**: This approach is designed with future scalability in mind. While system cards will initially +be stored in Postgres as JSONB blobs, we anticipate migrating to a Git-based local or remote storage solution +as the system evolves. Importantly, this initial decision allows for a seamless transition in the future, +ensuring no obstacles to migration. +* **Single source & Fast access**: Centralizing everything in a single Postgres database streamlines backups, +reduces maintenance complexity, and ensures quick data access. * **Built-in permissions**: Postgres provides built-in access control and security through its permission system. ### Negative * **Data tracking**: Changes to the system card are overwritten, making it difficult to maintain a history or audit trail. -* **Complex queries**: Complex queries, especially those involving nested data or formulas in the JSONB blob, -can be inefficient and require custom parsing. -* **Collaboration**: Team members face difficulties working collaboratively on the JSONB data due to its format and -lack of versioning. +* **Complex queries**: Complex queries can be inefficient and require custom parsing. +* **Collaboration**: Collaborating on the JSONB data is challenging due to its complex format and lack of version control. * **Scalability**: As the JSONB blobs grow in size, the storage overhead and query performance may become significant issues. diff --git a/docs/projects/amt/index.md b/docs/projects/amt/index.md index 8963cc04..adbd024f 100644 --- a/docs/projects/amt/index.md +++ b/docs/projects/amt/index.md @@ -1,19 +1,19 @@ # AMT -AMT is the acronym for Algorithm Management Toolkit. AMT has the goal to make algorithmic -systems more transparent; it achieves this by generating standardized reports on the algorithmic system which -encompasses both technical aspects in addition to descriptive information about the system and regulatory assessments. -For both the system and the model the lifecycle is important and this needs to be taken into account. The definition -for an algorithm is derived from the [Algoritmeregister](https://algoritmes.overheid.nl/nl/footer/over-algoritmes). +AMT is the acronym for Algorithm Management Toolkit. The AMT aims to enhance transparency and governance throughout +the entire lifecycle of algorithmic systems. By generating standardized reports, AMT provides a comprehensive view +of both technical details and descriptive information, including regulatory assessments, from development to deployment +and beyond. This continuous approach promotes accountability, oversight, and collaboration, ensuring that both models +and data remain transparent, controlled, and validated over time. The definition for an algorithm is derived from the +[Algoritmeregister](https://algoritmes.overheid.nl/nl/footer/over-algoritmes). -One of the goals of the TAD project is providing a standardized format of reporting on an algorithmic +One of the goals of the AMT is providing a standardized format of reporting on an algorithmic system by developing a [Reporting Standard](reporting-standard/index.md). This Reporting Standard consists out of a [System Card](reporting-standard/index.md#system_card) which contains [Model Cards](reporting-standard/index.md#model_card) and [Assessment Cards](reporting-standard/index.md#assessment_card). -The final result of the project is producing System, Model and Assessment Cards with both performance metrics -and technical measurements on fairness and bias of the model, assessments on the system where the specific -algorithm resides, and descriptive information about the system. +The final result of the AMT is producing System, Model and Assessment Cards with performance metrics, (regulatory) +assessments on the system where the specific algorithm resides, and descriptive information about the system. The requirements and instruments are dictated by the [Algoritmekader](https://minbzk.github.io/Algoritmekader/). From 77a55f347e5bff730d29a53841a91f750f6c85d4 Mon Sep 17 00:00:00 2001 From: Ravi Meijer Date: Tue, 22 Oct 2024 15:44:52 +0200 Subject: [PATCH 3/5] Include PR comments --- docs/projects/amt/adrs/0008-systemcard-storage.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/docs/projects/amt/adrs/0008-systemcard-storage.md b/docs/projects/amt/adrs/0008-systemcard-storage.md index 48145c18..f1ba1736 100644 --- a/docs/projects/amt/adrs/0008-systemcard-storage.md +++ b/docs/projects/amt/adrs/0008-systemcard-storage.md @@ -9,7 +9,7 @@ the deletion of system_cards. Therefore, we need a different kind of storage to ## Assumptions -* The system card data is small to moderate in size and fits within the capabilities of Postgres' JSONB storage. +* The system card data is small to moderate in size (up to 2GB), making it manageable for SQLite. * Tracking changes to the system card data over time is not a priority in the short term, but may become necessary in the future. @@ -48,3 +48,6 @@ reduces maintenance complexity, and ensures quick data access. * **Complex queries**: Complex queries can be inefficient and require custom parsing. * **Collaboration**: Collaborating on the JSONB data is challenging due to its complex format and lack of version control. * **Scalability**: As the JSONB blobs grow in size, the storage overhead and query performance may become significant issues. +* **Not supported by SQLite**: While SQLite supports JSON through its JSON1 extension, it does not support PostgreSQL's +JSONB data type natively, which complicates local development and testing environments that rely on SQLite as a database +backend. From ec49d9ef9494216d5728bb4ac47ebf1cda8d7514 Mon Sep 17 00:00:00 2001 From: Ravi Meijer Date: Wed, 23 Oct 2024 09:20:09 +0200 Subject: [PATCH 4/5] Change from jsonb to json --- docs/index.md | 2 +- docs/projects/amt/adrs/0008-systemcard-storage.md | 11 ++++++----- 2 files changed, 7 insertions(+), 6 deletions(-) diff --git a/docs/index.md b/docs/index.md index 99d150f4..b6c8a0de 100644 --- a/docs/index.md +++ b/docs/index.md @@ -17,7 +17,7 @@ We work on the following projects within the _Transparency of Algorithmic Decisi graph TB ak[Algoritmekader] <--> amt - subgraph amt[Algorithm Management Toolkit] + subgraph amt[Algorithm Management Toolkit] tr[Task Registry] --> amp[Algorithm Management Platform] st[Reporting Standard] --> amp amp <--> ai_act_decision_tree[AI Act Decision Tree] diff --git a/docs/projects/amt/adrs/0008-systemcard-storage.md b/docs/projects/amt/adrs/0008-systemcard-storage.md index f1ba1736..1b28bced 100644 --- a/docs/projects/amt/adrs/0008-systemcard-storage.md +++ b/docs/projects/amt/adrs/0008-systemcard-storage.md @@ -9,24 +9,25 @@ the deletion of system_cards. Therefore, we need a different kind of storage to ## Assumptions -* The system card data is small to moderate in size (up to 2GB), making it manageable for SQLite. +* The system card data is small to moderate in size (up to 255MB), making it manageable to store +in databases (in postgres as well as in in SQLite). * Tracking changes to the system card data over time is not a priority in the short term, but may become necessary in the future. ## Decision -The system card of an algorithm system is stored solely as a JSONB blob in the projects table in Postgres, +The system card of an algorithm system is stored solely as a JSON blob in the projects table in Postgres, with no additional storage elsewhere. ## Risks * **Data Overwrite**: As the system card is overwritten with each update, it becomes difficult to track historical changes or revert to previous states. -* **Scaling**: As the project grows, managing larger JSONB blobs may present performance challenges, +* **Scaling**: As the project grows, managing larger JSON blobs may present performance challenges, particularly when handling complex queries. -* **Collaboration**: Collaborating on the system card content is more difficult, as the JSONB format +* **Collaboration**: Collaborating on the system card content is more difficult, as the JSON format requires parsing and manual intervention for certain tasks. -* **Limited Querying**: While Postgres supports querying JSONB, complex queries and data manipulations +* **Limited Querying**: While Postgres supports querying and indexing JSON fields, complex queries and data manipulations may be inefficient without proper indexing or further optimization. ## Consequences From 5761f45757771f547b9badaf479a06397dc457b0 Mon Sep 17 00:00:00 2001 From: Ravi Meijer Date: Fri, 25 Oct 2024 13:24:05 +0200 Subject: [PATCH 5/5] Add link to AMT --- docs/projects/amt/index.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/projects/amt/index.md b/docs/projects/amt/index.md index adbd024f..4b2391a8 100644 --- a/docs/projects/amt/index.md +++ b/docs/projects/amt/index.md @@ -1,6 +1,7 @@ # AMT -AMT is the acronym for Algorithm Management Toolkit. The AMT aims to enhance transparency and governance throughout +AMT is the acronym for [Algorithm Management Toolkit](https://amt.prd.apps.digilab.network). +The AMT aims to enhance transparency and governance throughout the entire lifecycle of algorithmic systems. By generating standardized reports, AMT provides a comprehensive view of both technical details and descriptive information, including regulatory assessments, from development to deployment and beyond. This continuous approach promotes accountability, oversight, and collaboration, ensuring that both models