Skip to content

Commit

Permalink
Update our team website with current ambition + ADR on systemcard sto…
Browse files Browse the repository at this point in the history
…rage (#256)
  • Loading branch information
ravimeijerrig authored Oct 31, 2024
2 parents 15546dd + 5761f45 commit bb1740e
Show file tree
Hide file tree
Showing 3 changed files with 66 additions and 10 deletions.
3 changes: 2 additions & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,10 @@ We work on the following projects within the _Transparency of Algorithmic Decisi
graph TB
ak[<a href='https://minbzk.github.io/Algoritmekader/'>Algoritmekader</a>] <--> amt
subgraph amt[Algorithm Management Toolkit]
subgraph amt[<a href='https://amt.prd.apps.digilab.network'>Algorithm Management Toolkit</a>]
tr[<a href='https://minbzk.github.io/task-registry'>Task Registry</a>] --> amp[<a href='https://github.com/MinBZK/amt/'>Algorithm Management Platform</a>]
st[<a href='/ai-validation/projects/amt/reporting-standard/'>Reporting Standard</a>] --> amp
amp <--> ai_act_decision_tree[<a href='https://ai-act-decisiontree.apps.digilab.network'>AI Act Decision Tree</a>]
amp <--> llm[<a href='/ai-validation/projects/llm-benchmarks/'>LLM Benchmark Tooling</a>]
end
Expand Down
54 changes: 54 additions & 0 deletions docs/projects/amt/adrs/0008-systemcard-storage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# AMT-0008 System card Storage

## Context

By default, Kubernetes pods use ephemeral storage, which is tied to the pod's lifecycle.
When the pod terminates or restarts, all data is lost. The `/tmp/` directory, being part of
the system's temporary file storage, is cleared during reboots or pod restarts, resulting in
the deletion of system_cards. Therefore, we need a different kind of storage to preserve the data.

## Assumptions

* The system card data is small to moderate in size (up to 255MB), making it manageable to store
in databases (in postgres as well as in in SQLite).
* Tracking changes to the system card data over time is not a priority in the short term, but may become
necessary in the future.

## Decision

The system card of an algorithm system is stored solely as a JSON blob in the projects table in Postgres,
with no additional storage elsewhere.

## Risks

* **Data Overwrite**: As the system card is overwritten with each update, it becomes difficult to track
historical changes or revert to previous states.
* **Scaling**: As the project grows, managing larger JSON blobs may present performance challenges,
particularly when handling complex queries.
* **Collaboration**: Collaborating on the system card content is more difficult, as the JSON format
requires parsing and manual intervention for certain tasks.
* **Limited Querying**: While Postgres supports querying and indexing JSON fields, complex queries and data manipulations
may be inefficient without proper indexing or further optimization.

## Consequences

### Positive

* **Fast implementation**: The solution is easy to set up, reducing the time to get the project operational.
* **Future proof**: This approach is designed with future scalability in mind. While system cards will initially
be stored in Postgres as JSONB blobs, we anticipate migrating to a Git-based local or remote storage solution
as the system evolves. Importantly, this initial decision allows for a seamless transition in the future,
ensuring no obstacles to migration.
* **Single source & Fast access**: Centralizing everything in a single Postgres database streamlines backups,
reduces maintenance complexity, and ensures quick data access.
* **Built-in permissions**: Postgres provides built-in access control and security through its permission system.

### Negative

* **Data tracking**: Changes to the system card are overwritten, making it difficult to maintain a history or audit trail.
* **Complex queries**: Complex queries can be inefficient and require custom parsing.
* **Collaboration**: Collaborating on the JSONB data is challenging due to its complex format and lack of version control.
* **Scalability**: As the JSONB blobs grow in size, the storage overhead and query performance may become significant issues.
* **Not supported by SQLite**: While SQLite supports JSON through its JSON1 extension, it does not support PostgreSQL's
JSONB data type natively, which complicates local development and testing environments that rely on SQLite as a database
backend.
19 changes: 10 additions & 9 deletions docs/projects/amt/index.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,20 @@
# AMT

AMT is the acronym for Algorithm Management Toolkit. AMT has the goal to make algorithmic
systems more transparent; it achieves this by generating standardized reports on the algorithmic system which
encompasses both technical aspects in addition to descriptive information about the system and regulatory assessments.
For both the system and the model the lifecycle is important and this needs to be taken into account. The definition
for an algorithm is derived from the [Algoritmeregister](https://algoritmes.overheid.nl/nl/footer/over-algoritmes).
AMT is the acronym for [Algorithm Management Toolkit](https://amt.prd.apps.digilab.network).
The AMT aims to enhance transparency and governance throughout
the entire lifecycle of algorithmic systems. By generating standardized reports, AMT provides a comprehensive view
of both technical details and descriptive information, including regulatory assessments, from development to deployment
and beyond. This continuous approach promotes accountability, oversight, and collaboration, ensuring that both models
and data remain transparent, controlled, and validated over time. The definition for an algorithm is derived from the
[Algoritmeregister](https://algoritmes.overheid.nl/nl/footer/over-algoritmes).

One of the goals of the TAD project is providing a standardized format of reporting on an algorithmic
One of the goals of the AMT is providing a standardized format of reporting on an algorithmic
system by developing a [Reporting Standard](reporting-standard/index.md). This Reporting Standard consists out of a
[System Card](reporting-standard/index.md#system_card) which contains
[Model Cards](reporting-standard/index.md#model_card) and
[Assessment Cards](reporting-standard/index.md#assessment_card).

The final result of the project is producing System, Model and Assessment Cards with both performance metrics
and technical measurements on fairness and bias of the model, assessments on the system where the specific
algorithm resides, and descriptive information about the system.
The final result of the AMT is producing System, Model and Assessment Cards with performance metrics, (regulatory)
assessments on the system where the specific algorithm resides, and descriptive information about the system.

The requirements and instruments are dictated by the [Algoritmekader](https://minbzk.github.io/Algoritmekader/).

0 comments on commit bb1740e

Please sign in to comment.