diff --git a/search/search_index.json b/search/search_index.json index 4eb03f3a..102aa09c 100644 --- a/search/search_index.json +++ b/search/search_index.json @@ -1 +1 @@ -{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Home","text":"
Here we are documenting the processes and work of the AI Validation Team at the Ministry of the Interior and Kingdom Relations in The Netherlands.
We are a team of engineers, UX designers & researchers, and product experts at a policy department.
We work on the following projects within the Transparency of Algorithmic Decision making scope:
graph TB\n ak[<a href='https://minbzk.github.io/Algoritmekader/'>Algoritmekader</a>] <--> tmt\n\n subgraph tmt[Algorithm Management Toolkit]\n st[<a href='/ai-validation/projects/tad/reporting-standard/'>Reporting Standard</a>] --> tad[<a href='https://github.com/MinBZK/tad/'>Algorithm Management Platform</a>]\n tad <--> llm[<a href='/ai-validation/projects/llm-benchmarks/'>LLM Benchmark Tooling</a>]\n end\n\n tmt --> ar[<a href='https://algoritmes.overheid.nl/en/'>The Algorithm Register of the Dutch government</a>]\n tmt --> or[Other registries]
"},{"location":"#contribute","title":"Contribute","text":"Read our guide on how to contribute.
"},{"location":"#contact","title":"Contact","text":"Our contact details are here.
"},{"location":"about/contact/","title":"Contact","text":"Contact us at ai-validatie@minbzk.nl.
"},{"location":"about/team/","title":"Our Team","text":""},{"location":"about/team/#robbert-bos","title":"Robbert Bos","text":"Product Owner
Robbert has been on a mission for over 15 years to enhance the transparency and collaboration within AI projects. Before joining this team, he founded several data science and tech companies (partly) dedicated to this cause. Robbert is passionate about solving complex problems where he connects business needs with technology and involves others in how these solutions can improve their work.
robbertbos
Robbert Bos
"},{"location":"about/team/#lucas-haitsma","title":"Lucas Haitsma","text":"Researcher in Residence
Lucas is PhD candidate conducting research into the regulation and governance of algorithmic discrimination by supervision and enforcement organizations. Lucas is our Researcher in Residence.
Lucas Haitsma
rug.nl
"},{"location":"about/team/#berry-den-hartog","title":"Berry den Hartog","text":"Engineer
Berry is a software engineer passionate about problem-solving and system optimization, with expertise in Go, Python, and C++. Specialized in architecting high-volume data processing systems and implementing Lean-Agile and DevOps practices. Experienced in managing end-to-end processes from hardware provisioning to software deployment and release.
berrydenhartog
Berry den Hartog
"},{"location":"about/team/#anne-schuth","title":"Anne Schuth","text":"Engineering Manager
Anne used to be a Machine Learning Engineering Manager at Spotify and previously held roles at DPG Media, Blendle, and Google AI. He holds a PhD from the University of Amsterdam.
anneschuth
Anne Schuth
anneschuth.nl
"},{"location":"about/team/#christopher-spelt","title":"Christopher Spelt","text":"Engineer
After graduating in pure mathematics, Christopher transitioned into machine learning. He is passionate about solving complex problems, especially those that have a societal impact. My expertise lies in math, machine learning theory and I'm skilled in Python.
ChristopherSpelt
Christopher Spelt
"},{"location":"about/team/#robbert-uittenbroek","title":"Robbert Uittenbroek","text":"Engineer
Robbert is a highly enthusiastic full-stack engineer with a Bachelor's degree in Computer Science from the Hanze University of Applied Sciences in Groningen. He is passionate about building secure, compliant, and ethical solutions, and thrives in collaborative environments. Robbert is eager to leverage his skills and knowledge to help shape and propel the future of IT within the government.
uittenbroekrobbert
Robbert Uittenbroek
"},{"location":"about/team/#laurens-weijs","title":"Laurens Weijs","text":"Engineer
Laurens is a passionate guy with a love for innovation and doing things differently. With a background in Econometrics and Computer Science he loves to tackle the IT challenges of the Government by helping other people through extensive knowledge sharing on stage, building neural networks himself, or building a strong community.
laurensWe
Laurens Weijs
"},{"location":"about/team/#guusje-juijn","title":"Guusje Juijn","text":"Trainee
Guusje is currently enrolled in a two-year traineeship at the Dutch Government. After finishing her first assignment at a policy department, she is excited to bring her knowledge about AI policy to a technical team. Guusje has a background in Artificial Intelligence, is experienced in Python and machine learning and has a strong interest in AI ethics.
GuusjeJuijn
Guusje Juijn
"},{"location":"about/team/#ruben-rouwhof","title":"Ruben Rouwhof","text":"UX/UI Designer
Ruben is a dedicated UX/UI Designer focused on crafting user-centric digital experiences. He is involved in projects from start to finish, covering user research, design, and technical implementation.
rubenrouwhof
Ruben Rouwhof
rubenrouwhof.nl
"},{"location":"about/team/#ravi-meijer","title":"Ravi Meijer","text":"Product Researcher
Ravi is an accomplished data scientist with expertise in machine learning, responsible AI, and the data science lifecycle. Her background in AI fuels her passion for solving complex problems and driving innovation for positive social impact.
ravimeijerrig
Ravi Meijer
"},{"location":"about/team/#our-alumni","title":"Our Alumni","text":""},{"location":"about/team/#willy-tadema","title":"Willy Tadema","text":"AI Ethics Lead
Willy specializes in AI governance, AI risk management, AI assurance and ethics-by-design. She is an advocate of AI standards and a member of several ethics committees.
FrieseWoudloper
Willy Tadema
"},{"location":"adrs/0001-adrs/","title":"ADR-0001 ADRs","text":""},{"location":"adrs/0001-adrs/#context","title":"Context","text":"In modern software development practices, the use of Architecture Decision Records (ADRs) has become increasingly common. ADRs are documents that capture important architectural decisions made during the development process. These decisions play a crucial role in guiding the development team and ensuring consistency and coherence in the architecture of the software system.
"},{"location":"adrs/0001-adrs/#assumptions","title":"Assumptions","text":"We will utilize ADRs in our team to document and communicate architectural decisions effectively. Furthermore, we will publish these ADRs publicly to promote transparency and facilitate collaboration.
"},{"location":"adrs/0001-adrs/#template","title":"Template","text":"Use the template below to add an ADR:
# ADR-XXXX Title\n\n## Context\n\nWhat is the issue that we're seeing that is motivating this decision or change?\n\n## Assumptions\n\nAnything that could cause problems if untrue now or later. (optional)\n\n## Decision\n\nWhat is the change that we're proposing and/or doing?\n\n## Risks\n\nAnything that could cause malfunction, delay, or other negative impacts. (optional)\n\n## Consequences\n\nWhat becomes easier or more difficult to do because of this change?\n\n## More Information\n\nProvide additional evidence/confidence for the decision outcome\nLinks to other decisions and resources might here appear as well. (optional)\n
"},{"location":"adrs/0002-code-platform/","title":"ADR-0002 Code Platform","text":""},{"location":"adrs/0002-code-platform/#context","title":"Context","text":"In the landscape of software development, the choice of coding platform significantly impacts developer productivity, collaboration, and code quality. it's crucial to evaluate and select a coding platform that aligns with our development needs and fosters efficient workflows.
"},{"location":"adrs/0002-code-platform/#assumptions","title":"Assumptions","text":"The following assumptions are made:
After careful consideration and evaluation of various options like GitHub, GitLab and BitBucket, we propose adopting GitHub as our primary coding platform. The decision is based on the following factors:
Costs: There are currently no costs associate in using GitHub for our use cases.
Features and Functionality: GitHub offers a comprehensive set of features essential for modern software development and collaboration with external teams, including version control, code review, issue tracking, continuous integration, and deployment automation.
Security: GitHub offers a complete set of security features essential to secure development like dependency management and security scanning.
Community and Ecosystem: GitHub boasts a vibrant community and ecosystem, facilitating knowledge sharing, collaboration, and access to third-party tools, and services that can enhance our development workflows. Within our organization we have easy access to the team managing the GitHub organization.
Usability and User Experience: A user-friendly interface and intuitive workflows are essential for maximizing developer productivity and minimizing onboarding time. GitHub offers a streamlined user experience and customizable workflows that align with our team's preferences and practices.
"},{"location":"adrs/0002-code-platform/#risks","title":"Risks","text":"Currently the organization of MinBZK on GitHub does not have a lot of people
indicating that our team is an early adapter of the platform within the organization. This might impact our features due to cost constrains.
If we choose another tool in the future we need to migrate our codebase, and potentially need to rewrite some specific GitHub features that cannot be used in another tool.
"},{"location":"adrs/0002-code-platform/#more-information","title":"More Information","text":"Alternatives considered:
Our development team wants to implement a CI/CD solution to streamline the build, testing, and deployment workflows of our software products. Currently, our codebase resides on GitHub, and we leverage Kubernetes as our chosen orchestration platform, managed by the DigiLab platform team.
"},{"location":"adrs/0003-ci-cd/#decision","title":"Decision","text":"We will use the following tools for CI/CD pipeline:
GitHub Actions aligns with our existing infrastructure, ensuring seamless integration with our codebase and minimizing operational overhead. GitHub Actions' specific syntax for CI results in vendor lock-in, necessitating significant effort to migrate to an alternative CI system in the future.
Flux, being a GitOps operator for Kubernetes, offers a declarative approach to managing deployments, enhancing reliability and repeatability within our Kubernetes ecosystem.
"},{"location":"adrs/0004-software-hosting-platform/","title":"ADR-0004 Software hosting platform","text":""},{"location":"adrs/0004-software-hosting-platform/#context","title":"Context","text":"Our team recognizes the necessity of a platform to run our software, as our local machines lack the capacity to handle certain workloads effectively. We have evaluated several options available to us:
We operate under the following assumptions:
We will use Digilab Kubernetes for our workloads.
"},{"location":"adrs/0004-software-hosting-platform/#consequences","title":"Consequences","text":"By choosing Digilab Kubernetes, we gain access to a namespace within their managed Kubernetes cluster. However, it's important to note that Digilab does not provide any guarantees regarding the availability of the cluster. Should our software require higher availability assurances, we may need to explore alternative solutions.
"},{"location":"adrs/0005-python-tooling/","title":"ADR-0005 Python coding standard and tools","text":""},{"location":"adrs/0005-python-tooling/#context","title":"Context","text":"In modern software development, maintaining code quality is crucial for readability, maintainability, and collaboration. Python, being a dynamically typed language, requires robust tooling to ensure code consistency and type safety. Manual enforcement of coding standards is time-consuming and error-prone. Hence, adopting automated tooling to streamline this process is imperative.
"},{"location":"adrs/0005-python-tooling/#decision","title":"Decision","text":"We will use these standards and tools for our own projects:
Working with external projects these coding standards will not always be possible. but we will try to integrate them as much as possible.
"},{"location":"adrs/0005-python-tooling/#consequences","title":"Consequences","text":"Improved Code Quality: Adoption of these tools will lead to improved code quality, consistency, and maintainability across the project.
Enhanced Developer Productivity: Automated code formatting and static type checking will reduce manual effort and free developers to focus more on coding logic rather than formatting and type-related issues.
Reduced Bug Incidence: Static typing and linting will catch potential bugs and issues early in the development process, reducing the likelihood of runtime errors and debugging efforts.
Standardized Development Workflow: By integrating pre-commit hooks, the development workflow will be standardized, ensuring that all developers follow the same code quality standards.
"},{"location":"adrs/0006-agile-tooling/","title":"ADR-0006 Agile tooling","text":""},{"location":"adrs/0006-agile-tooling/#context","title":"Context","text":"Our development team wants to enhance transparency and productivity in our software development processes. We are using GitHub for version control and collaboration. However, to further streamline our process, there is a need to incorporate tooling for managing the effort of our team.
"},{"location":"adrs/0006-agile-tooling/#decision","title":"Decision","text":"We will use GitHub Projects as our agile process tool
"},{"location":"adrs/0006-agile-tooling/#consequences","title":"Consequences","text":"GitHub Projects seamlessly integrates with our existing GitHub repositories, allowing us to manage our Agile processes. within the same ecosystem where our code resides. This integration eliminates the need for additional third-party tools, simplifying our workflow.
"},{"location":"adrs/0007-commit-convention/","title":"ADR-0007 Commit convention","text":""},{"location":"adrs/0007-commit-convention/#context","title":"Context","text":"In software development, maintaining clear and consistent commit message conventions is crucial for effective collaboration, code review, and project management. Commit messages serve as a form of documentation, helping developers understand the changes introduced by each commit without having to analyze the code diff extensively.
"},{"location":"adrs/0007-commit-convention/#decision","title":"Decision","text":"A commit message must follow the following rules:
\\<ref>-\\<ticketnumber>: subject line
An example of a commit message:
Fix foo to enable bar
or
AB-1234: Fix foo to enable bar
or
Fix foo to enable bar
This fixes the broken behavior of component abc caused by problem xyz.
If we contribute to projects not started by us we try to follow the above standard unless a specific convention is obvious or required by the project.
"},{"location":"adrs/0007-commit-convention/#consequences","title":"Consequences","text":"In some repositories Conventional Commits are used. This ADR does not follow conventional commits.
"},{"location":"adrs/0008-architectural-diagram-tooling/","title":"ADR-0008 Architectural Diagram Tooling","text":""},{"location":"adrs/0008-architectural-diagram-tooling/#context","title":"Context","text":"To communicate our designs in a graphical manner, it is of importance to draw architectural diagrams. For this we use tooling, that supports us in our work. We need to have something that is written so that it can be processed by both people and machine, and we want to have version control on our diagrams.
"},{"location":"adrs/0008-architectural-diagram-tooling/#decision","title":"Decision","text":"We will write our architectural diagrams in Markdown-like (.mmmd) in the Mermaid Syntax to edit these diagrams one can use the various plugins. For each project where it is needed, we will add the diagrams in the repository of the subject. The level of detail we will provide in the diagrams is according to the C4-model metamodel on architecture diagramming.
"},{"location":"adrs/0008-architectural-diagram-tooling/#consequences","title":"Consequences","text":"Standardized Workflow: By maintaining architecture as code, it will be standardized in our workflow.
Version control on diagrams: By using version control, we will be able to collaborate easier on the diagrams, and we will be able to see the history of them.
Diagrams are in .md format: By storing our diagrams next to our code, it will be where you need it the most.
"},{"location":"adrs/0010-container-registry/","title":"ADR-0010 Container Registry","text":""},{"location":"adrs/0010-container-registry/#context","title":"Context","text":"Containers allow us to package and run applications in a standardized and portable way. To be able to (re)use and share images, they need to be stored in a registry that is accessible by others.
There are many container registries. During research the following registries have been noted:
Docker Hub, GitHub Container Registry, Amazon Elastic Container Registry (ECR), Azure Container Registry (ACR), Google Artifact Registry (GAR), Red Hat Quay, GitLab Container Registry, Harbor, Sonatype Nexus Repository Manager, JFrog Artifactory.
"},{"location":"adrs/0010-container-registry/#assumptions","title":"Assumptions","text":"We will use GitHub Container Registry.
This aligns best with the previously made choices for GitHub as a code repository and CI/CD workflow.
"},{"location":"adrs/0010-container-registry/#risks","title":"Risks","text":"Traditionally, Docker Hub has been the place to publish images. Therefore, our images may be more difficult to discover.
The following assumptions are not (directly) covered by the chosen registry:
By using GitHub Container Registry we have a container registry we can use both internally as well as share with others. This has low impact, we can always move to another registry since the Open Container Initiative is standardized.
"},{"location":"adrs/0010-container-registry/#more-information","title":"More Information","text":"The following sites have been consulted:
The AI validation team works transparently. Working with public funds warrants transparency toward the public. Additionally, being transparent aligns with the team's mission of increasing the transparency of public organizations. In line with this reasoning, it is important to be open to researchers interested in the work of the AI validation team. Allowing researchers to conduct research within the team contributes to transparency and enables external perspectives and feedback to be incorporated into the team's work.
"},{"location":"adrs/0011-researcher-in-residence/#assumptions","title":"Assumptions","text":"We have decided to include a researcher in residence as a member of our team.
The researcher in residence takes the following form:
The following conditions apply to the researcher in residence.
Risks around a potential chilling effect (team members not feeling free to express themselves) are mitigated by the conditions we impose. In light of aforementioned form and conditions above, we see no further significant risks.
"},{"location":"adrs/0011-researcher-in-residence/#consequences","title":"Consequences","text":"Including a researcher in residence makes it easier for them to conduct research within both the team and the wider organization where the AI validation team operates. This benefits the quality of the research findings and the feedback provided to the team and organization.
"},{"location":"adrs/0012-dictionary-for-spelling/","title":"ADR-0012 Dictionary for spelling","text":""},{"location":"adrs/0012-dictionary-for-spelling/#context","title":"Context","text":"We use English as language in some of our external communications, like on GitHub. We noticed that among different documents certain words are spelled correctly but differently, depending on the author or dictionary used. Also there are occasional typos which can cause distraction and don't meet professional standards.
"},{"location":"adrs/0012-dictionary-for-spelling/#assumptions","title":"Assumptions","text":"Standardizing the used dictionary avoids discussion on spelling and makes documents consistent. Eliminating typos contributes to professional, credible and unambiguous documents.
Using a dictionary in a pre-commit hook will prevent commits being made with obvious spelling issues.
"},{"location":"adrs/0012-dictionary-for-spelling/#decision","title":"Decision","text":"We will use the U.S. English spelling dictionary.
"},{"location":"adrs/0012-dictionary-for-spelling/#risks","title":"Risks","text":"It may slow down committing large files.
"},{"location":"adrs/0012-dictionary-for-spelling/#consequences","title":"Consequences","text":"Documents will all use the same dictionary for spelling and will not contain typos.
"},{"location":"adrs/0013-date-time-representation/","title":"ADR-0013 Date Time Representation: ISO 8601","text":""},{"location":"adrs/0013-date-time-representation/#context","title":"Context","text":"In our software development projects, we have encountered ambiguity related to the representation of dates and times, particularly when dealing with time zones. The lack of a standardized approach has led to discussions and possibly ambiguity when interpreting timestamps within our applications.
"},{"location":"adrs/0013-date-time-representation/#assumptions","title":"Assumptions","text":"Standardizing the representation of dates and times will improve clarity and precision in our application's logic and user interfaces.
ISO 8601 format is better human-readable than other formats such as unix timestamps.
"},{"location":"adrs/0013-date-time-representation/#decision","title":"Decision","text":"We adopt ISO 8601 with timezone notation, preferably in UTC (Z
), as the standard method for representing dates and times in our software projects, replacing the usage of Unix timestamps or any other formats or timezones. We use both dashes (-
) and colons (:
).
We store date and time as: 2024-04-16T16:48:14Z
(preferably with Z
as timezone, representing UTC)
We store dates as 2024-04-16
.
Only when capturing client events we may want to choose to store the client timezone instead of UTC.
When rendering a date and time in a user interface, we may want to localize the date and time for the appropriate timezone.
"},{"location":"adrs/0013-date-time-representation/#risks","title":"Risks","text":"Increased storage space: ISO 8601 representations can be longer than other formats, leading to potential increases in storage requirements, especially when dealing with large datasets.
"},{"location":"adrs/0013-date-time-representation/#consequences","title":"Consequences","text":"A single ISO 8601 with UTC timezone provides a clear and unambiguous way to represent dates and times. Its format is easily recognizable and eliminates the need for interpretation. For example: 2024-04-15T10:00:00Z
can easily be understood without needing to parse it using a library.
We will need to regularly convert from localized time to UTC and back when capturing, storing, and rendering dates and times.
"},{"location":"adrs/0013-date-time-representation/#more-information","title":"More Information","text":"ISO 8601 is an internationally recognized standard endorsed by the International Organization for Standardization (ISO). Its adoption offers numerous benefits, including improved clarity, global accessibility, and future-proofing of systems and applications.
For further reading on ISO 8601:
In order to expand our reach and foster international collaboration in the field of AI Validation, we have decided to conduct all communication in English on public platforms such as GitHub. This decision aims to facilitate better understanding and participation from our global colleagues. However, within the Government of the Netherlands, the norm is to communicate in Dutch for internal purposes. This ADR will provide guidelines on which language to use for different types of communications.
"},{"location":"adrs/0014-written-language/#assumptions","title":"Assumptions","text":"There is no requirement to use Dutch as the primary language for all our activities while working for the Government of the Netherlands. More information can be found in the More Information section.
"},{"location":"adrs/0014-written-language/#decision","title":"Decision","text":"The following channels will utilize English:
The primary language for the following channels will be Dutch:
Dutch-only developers will have a harder time following along with the progression of our team on both the code on GitHub as our Project Management.
"},{"location":"adrs/0014-written-language/#consequences","title":"Consequences","text":"Although many attempts by previous cabinets, Dutch is not the official language in the Netherlands according to the Dutch constitution. See the following link.
According to the website of the Government of the Netherlands the Dutch language is the official recognized language. This means that in combination with the law Algemene wet bestuursrecht
on wetten.overheid.nl governing bodies and their employees need to communicate in Dutch unless stated differently elsewhere. It is stated here that communicating in another language than Dutch is permitted if the goal of communicating in another language than Dutch is sufficiently justified and if other parties are not effected disproportionately by the usage of another language.
Right now we have a few organizations (Logius, SSC-ICT, ODC-Noord, Tender process, and Digilab, etc...) offering IT infrastructure. This ADR will give an overview of what these different organizations are offering as well as make a decision for the AI Validation team on which infrastructure provider we will focus.
"},{"location":"adrs/0016-government-cloud-comparison/#descriptions-and-comparison","title":"Descriptions and comparison","text":"Please see the following picture for an overview of the providers in relation to what they can provide, currently we are heavily searching in the realm of unmanaged infrastructure, as we want this to manage ourselves.
"},{"location":"adrs/0016-government-cloud-comparison/#decision","title":"Decision","text":"For our infrastructure provider we decided to go with Digilab as the main source, as they can provide us with a Kubernetes namespace and are a reliable and convenient partner as we work closely with them.
"},{"location":"adrs/0016-government-cloud-comparison/#risks","title":"Risks","text":"Certain choices are made for us if we make use of the Kubernetes namespace of Digilab, for example that we need to make use of Flux for our CI/CD pipeline.
"},{"location":"adrs/0016-government-cloud-comparison/#extra-information","title":"Extra information","text":"Large Languages Models (LLMs) are becoming increasingly popular in assisting people in a variety of tasks. These tasks include, but are not limited to, information retrieval, assisting with coding and essay writing. In the context of the government, tasks can include for example supporting Freedom of Information Act (FOIA) requests and aiding in answering questions of citizens.
While the potential benefit of using LLMs is large, there are also significant risks. Basically an LLM is just a next token predictor, which bases its predictions on the user input (context) and on compressed information seen during training (LLM parameters); hence there is no guarantee on the quality and correctness of the output. Moreover, due to bias in the training data, LLMs can have bias in their output, despite best efforts to mitigate this. Additionally, we have human values that we expect LLMs to be aligned with. Certainly, within the context of a government, we should take utmost care not to discriminate. To assess the quality, correctness, bias and alignment with human values of an LLM one can perform benchmarks.
"},{"location":"projects/llm-benchmarks/#the-project","title":"The project","text":"The LLM Benchmarks project of the AI Validation Team aims to create a platform where LLMs can be measured across a wide range of benchmarks. We limit ourselves to LLMs and benchmarks that are related to the Dutch society. Both LLMs and the benchmarks can be configured by users of the platform. Users can run these benchmarks on LLMs on our platform. The intended goal of this project is to give government organizations, citizens and companies insight in the various LLMs and their quality, correctness, bias and alignment with human values. The project also encompasses a dashboard with uploaded LLMs and their performance on uploaded benchmarks. With this platform we aim to enhance public trust in the usage of LLMs and expose potential bias that exists within LLMs.
"},{"location":"projects/tad/","title":"TAD","text":"TAD is the acronym for Transparency of Algorithmic Decision making. TAD has the goal to make algorithmic systems more transparent; it achieves this by generating standardized reports on the algorithmic system which encompasses both technical aspects in addition to descriptive information about the system and regulatory assessments. For both the system and the model the lifecycle is important and this needs to be taken into account. The definition for an algorithm is derived from the Algoritmeregister.
One of the goals of the TAD project is providing a standardized format of reporting on a algorithmic system by developing a Reporting Standard. This Reporting Standard consists out of a System Card which contains Model Cards and Assessment Cards.
The final result of the project is producing System, Model and Assessment Cards with both performance metrics and technical measurements on fairness and bias of the model, assessments on the system where the specific algorithm resides, and descriptive information about the system.
The requirements and instruments are dictated by the Algoritmekader.
"},{"location":"projects/tad/comparison/","title":"Comparison of Reporting Standards","text":"This document assesses standards that standardize the way algorithm assessments can be captured.
"},{"location":"projects/tad/comparison/#background","title":"Background","text":"There are many algorithm assessments (e.g. IAMA, HUIDERIA, etc.), technical tests on performance (e.g. Accuracy, TP, FP, F1, etc), fairness and bias of algorithms (e.g. SHAP) and reporting formats available. The goal is to have a way of standardizing the way these different assessments and tests can be captured.
"},{"location":"projects/tad/comparison/#available-standards","title":"Available standards","text":""},{"location":"projects/tad/comparison/#model-cards","title":"Model Cards","text":"The most interesting existing capturing methods seem to be all based on Model Cards for Model Reporting, which are:
\"Short documents accompanying trained machine learning models that provide benchmarked evaluation in a variety of conditions, such as across different cultural, demographic, or phenotypic groups (e.g., race, geographic location, sex, Fitzpatrick skin type) and intersectional groups (e.g., age and race, or sex and Fitzpatrick skin type) that are relevant to the intended application domains. Model cards also disclose the context in which models are intended to be used, details of the performance evaluation procedures, and other relevant information\", proposed by Google. Note that \"The proposed set of sections\" in the Model Cards paper \"are intended to provide relevant details to consider, but are not intended to be complete or exhaustive, and may be tailored depending on the model, context, and stakeholders.\"
Many companies implement their own version of Model Cards, for example Meta System Cards and the tools mentioned in the next section.
"},{"location":"projects/tad/comparison/#automatic-model-card-generation","title":"Automatic model card generation","text":"There exist tools to (semi)-automatically generate models cards:
A landscape analysis of ML documentation tools has been performed by Hugging Face and provides a good overview of the current landscape.
Another interesting standard is the Algorithmic Transparency Recording Standard of the United Kingdom Government, which can be found here.
"},{"location":"projects/tad/comparison/#proposal","title":"Proposal","text":"We need a standard that captures algorithmic assessments and technical tests on model and datasets. The idea of model cards can serve as a guiding theoretical principle on how to implement such a standard. More specifically, we can draw inspiration from the existing model card schema's and implementations of VerifyML and Hugging Face. We note the following:
Hence in any case we need to extend one of these standards. We propose to:
In modern software development practices, the use of Architecture Decision Records (ADRs) has become increasingly common. ADRs are documents that capture important architectural decisions made during the development process. These decisions play a crucial role in guiding the development team and ensuring consistency and coherence in the architecture of the software system.
"},{"location":"projects/tad/adrs/0001-adrs/#assumptions","title":"Assumptions","text":"We will utilize ADRs in this project repository and communicate architectural decisions effectively. Furthermore, we will publish these ADRs publicly to promote transparency and facilitate collaboration.
"},{"location":"projects/tad/adrs/0001-adrs/#template","title":"Template","text":"Use the template below to add an ADR:
# TAD-XXXX Title\n\n## Context\n\nWhat is the issue that we're seeing that is motivating this decision or change?\n\n## Assumptions\n\nAnything that could cause problems if untrue now or later. (optional)\n\n## Decision\n\nWhat is the change that we're proposing and/or doing?\n\n## Risks\n\nAnything that could cause malfunction, delay, or other negative impacts. (optional)\n\n## Consequences\n\nWhat becomes easier or more difficult to do because of this change?\n\n## More Information\n\nProvide additional evidence/confidence for the decision outcome\nLinks to other decisions and resources might here appear as well. (optional)\n
"},{"location":"projects/tad/adrs/0002-tad-reporting-standard/","title":"TAD-0002 TAD Reporting Standard","text":""},{"location":"projects/tad/adrs/0002-tad-reporting-standard/#context","title":"Context","text":"The TAD Reporting Standard proposes a standardized way of capturing information of ML-models and systems.
"},{"location":"projects/tad/adrs/0002-tad-reporting-standard/#assumptions","title":"Assumptions","text":"There is no existing standard of capturing all relevant information on ML-models that also includes fairness and bias tests and regulatory assessments.
A widely used implementation for Model Cards for Model Reporting is given by the Hugging Face Model Card metadata specification, which in turn is based on Papers with Code Model Index. This implementation does not capture sufficient details about metrics and does not include measurements from technical tests on bias and fairness or regulatory assessments.
"},{"location":"projects/tad/adrs/0002-tad-reporting-standard/#decision","title":"Decision","text":"We decided to implement a custom reporting standard. Our reporting standard can be split up into three elements.
We were heavily inspired by the Hugging Face Model Card metadata specification, which we essentially extended to allow for:
The extension is not strict, meaning that there the TAD Reporting Standard is not a valid Hugging Face metadata specification. The reason for this is that some fields in the Hugging Face standard are too much intertwined with the Hugging Face ecosystem and it would not be logical for us to couple our implementation this tightly to Hugging Face.
"},{"location":"projects/tad/adrs/0002-tad-reporting-standard/#risks","title":"Risks","text":"The TAD Reporting Standard is not fully backwards compatible with the Hugging Face Model Card metadata specification. If in the future the Hugging Face Model Card metadata specification becomes a standard, we might need to revise the TAD standard.
"},{"location":"projects/tad/adrs/0002-tad-reporting-standard/#consequences","title":"Consequences","text":"The TAD Reporting Standard allows us to capture relevant information on model performance, bias and fairness and regulatory assessments in a standardized way.
"},{"location":"projects/tad/adrs/0003-tad-tool/","title":"TAD-0003 Tool for Transparency of Algorithmic Decision making","text":""},{"location":"projects/tad/adrs/0003-tad-tool/#context","title":"Context","text":"We are considering tooling for organizations to get more grip on their algorithms. Tooling for, for instance bias and fairness tests, and assessments (like IAMA).
Transparency, we think, can be fostered by sharing reports from such a tool in a standardized way.
There are several existing open source tools which we have assessed. Some support only assessments, others already combine more features and can generate a report. There is however no tool that supports all the requirements we have.
These are our main requirements of our tool:
We will build our own solution. Where possible this solution should be able to re-use certain components of other related open-source projects.
"},{"location":"projects/tad/adrs/0003-tad-tool/#risks","title":"Risks","text":"We can develop a solution that is tailored to the needs of our stakeholders.
"},{"location":"projects/tad/adrs/0004-software-stack/","title":"TAD-0004 Software Stack for TAD","text":""},{"location":"projects/tad/adrs/0004-software-stack/#context","title":"Context","text":"For building our own TAD solution, we need to choose a software stack. During our earlier POCs and market research, we gathered insight and information on technologies to use and which not to use.
During further discussions and brainstorm sessions, a software stack was chosen that accommodates our needs best.
While more fine grained requirements are listed elsewhere, some key requirements are:
We stick to suitable programming languages. As most AI related tooling is written in Python, this language is the logical choice for our development as well.
Currently we do not see the need for a separate web GUI framework. it is preferred to bundle backend and frontend in one solution.
As part of a Dutch government organization, we need to adhere to all dutch laws and standards, like:
We will support the latest 3 minor version of Python v3 as programming language and Poetry for dependency management.
"},{"location":"projects/tad/adrs/0004-software-stack/#backend","title":"Backend","text":"The Python backend will use the following key dependencies:
We will use serverside rendering of HTML, based on HTMX. For styling and components we will use NL Design System.
"},{"location":"projects/tad/adrs/0004-software-stack/#testing","title":"Testing","text":"We will use pytest for unit-testing and VCRPY and Playwright for module and integration tests.
"},{"location":"projects/tad/adrs/0004-software-stack/#database","title":"Database","text":"We will use SQLModel or SQL Alchemy with SQLite for development and postgreSQL for production.
"},{"location":"projects/tad/adrs/0004-software-stack/#risks","title":"Risks","text":"As HTMX is relatively more limited than other UI frameworks, it may lack features we require but did not anticipate.
"},{"location":"projects/tad/adrs/0004-software-stack/#consequences","title":"Consequences","text":"We have clarity about the tools to use and develop our TAD tool.
"},{"location":"projects/tad/adrs/0005-ai-verify-technical-tests/","title":"TAD-0005 Add support to run technical tests via AI Verify","text":""},{"location":"projects/tad/adrs/0005-ai-verify-technical-tests/#context","title":"Context","text":"The AI Verify project is set up in a modular way, and the technical tests is one of the modules. The AI Verify team is developing a feature which makes it possible to run the technical tests using an API: a Python library with a method to run a test and providing the required configuration; for example, which model and dataset to use and some test specific configuration.
The result of the test are returned in a JSON format, which can be processed in any way we please, like writing it to a file or System Card or store it in a database.
"},{"location":"projects/tad/adrs/0005-ai-verify-technical-tests/#pros","title":"Pros","text":"Our technical tests will include, but may extend beyond, those offered by AI Verify.
"},{"location":"projects/tad/adrs/0005-ai-verify-technical-tests/#risks","title":"Risks","text":"The tests we use from AI Verify are tied to the AI Verify ecosystem. So it uses their (core) modules to load models and datasets. Adding support for other models or data formats, like models written in R, has to be done in the AI Verify core.
"},{"location":"projects/tad/adrs/0005-ai-verify-technical-tests/#consequences","title":"Consequences","text":"We have a set of technical tests we can integrate in the TAD tool.
"},{"location":"projects/tad/adrs/0006-extend-system-card-EU-AI-Act/","title":"TAD-0006 Include EU AI Act into System Card","text":""},{"location":"projects/tad/adrs/0006-extend-system-card-EU-AI-Act/#context","title":"Context","text":"The European Union AI Act represents a landmark regulatory framework aimed at ensuring the safe and ethical development and deployment of artificial intelligence technologies within the EU. It defines different policies and requirements for AI systems based on their risk levels, from minimal to unacceptable, to mitigate potential harms. Only for high-risk AI systems, an extended form of documentation is required, including technical documentation. This technical documentation consists of a general description of the AI system and a more detailed, in-depth description (including risk-management, monitoring, etc.).
To ensure that AI systems can be effectively audited, we aim to create a separate instrument called 'technical documentation for high-risk AI systems'. This will allow developers to easily extract and auditors to readily assess all necessary information for the technical documentation.
The RegCheck AI tool published by Hugging Face, checks model cards for compliance with the EU AI Act. However, this prototype tool is research work and not a commercial or legal product. Furthermore, because we use a modified model card setup, the performance may be less reliable.
"},{"location":"projects/tad/adrs/0006-extend-system-card-EU-AI-Act/#assumptions","title":"Assumptions","text":"The extended system card and proposed instrument will facilitate the documentation of information in accordance with the EU AI Act using the TAD tool.
"},{"location":"projects/tad/existing-tools/checklists/ai_assesment_tool_checklist/","title":"ALTAI","text":"See the introduction. It is a discussion tool about AI Systems.
"},{"location":"projects/tad/existing-tools/checklists/ai_assesment_tool_checklist/#functionality","title":"Functionality","text":"Requirement Priority Fulfilled Comments The tool allows users to conduct technical tests on algorithms or models, including assessments of performance, bias, and fairness. To facilitate these tests, users can input relevant datasets, M 0 The tool only allows for discussions not technical tests The tool allows users to choose which tests to perform. M 0 See above The tool allows users to fill out questionnaires to conduct impact assessments for AI. For example IAMA or ALTAI. M 1 This is very well supported by the tool The tool can generate a human readable report. M 0.9 There is an export functionality for the outcomes of the assessment, it offers a print dialog The tools works with a standardized report format, that it can read, write, and update. M 0 This report cannot be re-imported in a different tool as it only exports to pdf The tool supports plugin functionality so additional tests can be added easily. S 0 Not applicable The tool allows to create custom reports based on components. S 0 The report cannot be customized by the user It is possible to add custom components for reports. S 0 See above The tool provides detailed logging, including tracking of different model versions, changes in impact assessments, and technical test results for individual runs. S 0.75 There is even for the users an extensive audit trail what happened to assessment, not different model versions The tool supports saving progress. S 1 Yes this is supported The tool can be used on an isolated system without an internet connection. S 1 Yes it can be ran locally or in a docker container without internet The tool offers options to discuss and document conversations. For example, to converse about technical tests or to collaborate on impact assessments. C 1 This is the main feature of the tool The tool operates with complete data privacy; it does not share any data or logging information. C 1 Stored locally in a mongoDB The tool allows extension of report formats functionality. C 0.5 It could be developed that we export to markdown instead of pdf, but right now it just prints the window as pdf The tool can be integrated in a CI/CD flow. C 0 It is an UI tool, so doesn't make sense in a CI/CD pipeline The tool can be offered as a (cloud) service where no local installation is required. C 1 We could host this tool for other parties to use It is possible to define and automate workflows for repetitive tasks. C 0 It is an UI tool The tool offers pre-built connectors or low-code/no-code integration options to simplify the integration process. C 0 Nototal_score = 22.85
"},{"location":"projects/tad/existing-tools/checklists/ai_assesment_tool_checklist/#reliability","title":"Reliability","text":"Requirement Priority Fulfilled Comments The tool operates consistently and reliably, meaning it delivers the same expected results every time you use it. M 1 Yes The tool recovers automatically from common failures. S 1 The tool seems too do this The tool recovers from failures quickly, minimizing data loss, for example by automatically saving intermediate test progress results. S 1 The data is stored in mongoDB, so no data is lost The tool handles errors gracefully and informs users of any issues. S 1 If the email server is down the tool still operates The tool provides clear error messages and instructions for troubleshooting. S 0.8 Some errors are not very informative when you get them, but mostly email related aretotal_score = 15.4
"},{"location":"projects/tad/existing-tools/checklists/ai_assesment_tool_checklist/#usability","title":"Usability","text":"Requirement Priority Fulfilled Comments The tool possess a clean, intuitive, and visually appealing UI that follows industry standards. S 1 Very clean UI The tool provides clear and consistent navigation, making it easy for users to find what they need. S 1 Compared to AIVerify the navigation is very intuitive (but it also has less features) The tool is responsive and provides instant feedback. S 1 Yes The user interface is multilingual and supports at least English. S 0.8 There is support for multilingual, but the assessments are not translated and needs to be translated by hand The tool offers keyboard shortcuts for efficient interaction. C 0 No The user interface can easily be translated into other languages. C 0.8 The buttons are automatically translated but not the assessment itselftotal_score = 13
"},{"location":"projects/tad/existing-tools/checklists/ai_assesment_tool_checklist/#help-documentation","title":"Help & Documentation","text":"Requirement Priority Fulfilled Comments The tool provides comprehensive online help documentation with searchable functionalities. S 0.1 There is little documentation, only the website and the github readme The tool offers context-sensitive help within the application. C 0 The icons are just very clear, would be nice to have a question mark at some places The online documentation includes video tutorials and training materials for ease of learning. C 0 There is no such documentation The project provides readily available customer support through various channels (e.g., email, phone, online chat) to address user inquiries and troubleshoot issues. C 0.25 You can issue tickets on Github, no other way supported waytotal_score = 0.55
"},{"location":"projects/tad/existing-tools/checklists/ai_assesment_tool_checklist/#performance-efficiency","title":"Performance Efficiency","text":"Requirement Priority Fulfilled Comments The tool operates efficiently and minimize resource utilization. M 1 The docker container is not so very big, also doesn't use much resources The tool responds to user actions instantly. M 1 There is instant feedback in the UI The tool is scalable to accommodate increased user base and data volume. S 1 As it runs on Docker, you can scale this on Kubernetes for multiple userstotal_score = 11
"},{"location":"projects/tad/existing-tools/checklists/ai_assesment_tool_checklist/#maintainability","title":"Maintainability","text":"Requirement Priority Fulfilled Comments The tool is easy to modify and maintain. M 0.8 You need to be a bit aware of NextJS, then it is easy to maintain as it is not such a large tool The tool adheres to industry coding standards and best practices to ensure code quality and maintainability. M 0.8 The code looks well structured, they have deployments on github but I don't see any CI or pre-commit hooks The code is written in a common, widely adopted and supported and actively used and maintained programming language. M 1 NextJS is very common for frontend tools The project provides version control for code changes and rollback capabilities. M 1 The code is hosted on Github so yes The project is open source. M 1 see above It is possible to contribute to the source. S 1 It is possible, not many people have done this yet The system is modular, allowing for easy modification of individual components. S 0.6 Extra assessments can be appended to the system, but not in such a way that it supports multiple (different) assessments, but roles can be changed very easily Diagnostic tools are available to identify and troubleshoot issues. S 0.8 The standard NextJS tools to troubleshoot, but not many teststotal_score = 25.6
"},{"location":"projects/tad/existing-tools/checklists/ai_assesment_tool_checklist/#security","title":"Security","text":"Requirement Priority Fulfilled Comments The tool must protect data and system from unauthorized access, use, disclosure, disruption, modification, or destruction. M 1 The data is stored in MongoDB Regular security audits and penetration testing are conducted. S 0 When running docker compose up, the docker client will tell there are quite some CVE vulnerabilities in there, an upgrade of the Node version would help much here The tool enforce authorization controls based on user roles and permissions, restricting access to sensitive data and functionalities. C 0.5 The tool has support for multiple users and roles (but we couldn't find a user management system) Data encryption is used for sensitive information at rest and in transit. C 1 When data is transferred to mongoDB, a secure connection is set-up and also in the DB it is encrypted by MongoDB, also you have an SSL connection with the tool The project allows for regular security audits and penetration testing to identify vulnerabilities and ensure system integrity. C 1 The tool does allow this, as it is open-source The tool implements backup functionality to ensure data availability in case of incidents. C 1 The data is store in a volume next to the main container of thetotal_score = 7.5
"},{"location":"projects/tad/existing-tools/checklists/ai_assesment_tool_checklist/#compatibility","title":"Compatibility","text":"Requirement Priority Fulfilled Comments The tool is compatible with existing systems and infrastructure. M 1 As it is a container it can run on Kubernetes and therefore at Digilab The tool supports industry-standard data formats and protocols. M 1 Assessment and other config are stored in JSON The tool operates seamlessly on supported operating systems and hardware platforms. S 1 As it runs in a container it is able to run on all the major OSes if you have Docker Desktop or use a cloud version managed by yourself The tool supports commonly used data formats (e.g., CSV, Excel, JSON) for easy data exchange with other systems and tools. S 0 The tool currently only exports a pdf which is not an exchangeable format The tool integrates with existing security solutions. C 0 Not applicable as it is an UItotal_score = 11
"},{"location":"projects/tad/existing-tools/checklists/ai_assesment_tool_checklist/#accessibility","title":"Accessibility","text":"Requirement Priority Fulfilled Comments The tool is accessible to users with disabilities, following relevant accessibility standards (e.g., WCAG). S 0.1 The color scheme is pretty good viewable, but for the rest there are not accessibility featurestotal_score = 0.3
"},{"location":"projects/tad/existing-tools/checklists/ai_assesment_tool_checklist/#portability","title":"Portability","text":"Requirement Priority Fulfilled Comments The tool support a range of operating systems (e.g., Windows, macOS, Linux) commonly used within an organization. S 1 It is in docker so can run everywhere The tool minimizes dependencies on specific hardware or software configurations, promoting flexibility. S 1 This is all containerized The tool offers a cloud-based deployment option or be compatible with cloud environments for scalability and accessibility. S 1 As it is containerized we could host this ourselves in a cloud environment, the Belgium government does not offer a hosted version for you The tool adheres to relevant cloud security standards and best practices. S 0.8 The docker container does contain some outdated versions of for example Node.total_score = 11.4
"},{"location":"projects/tad/existing-tools/checklists/ai_assesment_tool_checklist/#deployment","title":"Deployment","text":"Requirement Priority Fulfilled Comments The tool has an easy and user-friendly installation and configuration process. S 1 It was very easy to install out-of-the-box The tool has on-premise or cloud-based deployment options to cater to different organizational needs and infrastructure. S 0 The tool does not promise on-prem or cloud-based managed deploymentstotal_score = 3
"},{"location":"projects/tad/existing-tools/checklists/ai_assesment_tool_checklist/#legal-compliance","title":"Legal & Compliance","text":"Requirement Priority Fulfilled Comments It is clear how the tool is funded to avoid improper influence due to conflicts of interest M 1 It is funded by the Belgian Government The tool is compliant with relevant legal and regulatory requirements. S 1 Yes EU license The tool adheres to (local) data privacy regulations like GDPR, ensuring the protection of user data. S 1 Data is stored locally The tool implements appropriate security measures to comply with industry regulations and standards. S 1 EUPL 1.2 license (although they say they have MIT license) The tool is licensed for use within the organization according to the terms and conditions of the license agreement. S 1 Yes, see above The tool respects intellectual property rights and avoid copyright infringement issues. S 1 Yes, see abovetotal_score = 19
"},{"location":"projects/tad/existing-tools/checklists/aiverify_checklist/","title":"AI Verify","text":"See the introduction
"},{"location":"projects/tad/existing-tools/checklists/aiverify_checklist/#functionality","title":"Functionality","text":"Requirement Priority Fulfilled Comments The tool allows users to conduct technical tests on algorithms or models, including assessments of performance, bias, and fairness. To facilitate these tests, users can input relevant datasets, M 1 This is core functionality of AIVerify The tool allows users to choose which tests to perform. M 1 This is core functionality of AIVerify The tool allows users to fill out questionnaires to conduct impact assessments for AI. For example IAMA or ALTAI. M 1 This is core functionality of AIVerify, however work is needed to add extra impact assessments The tool can generate a human readable report. M 1 This is core functionality of AIVerify The tools works with a standardized report format, that it can read, write, and update. M 0 The outputted format is a PDF format, so this cannot be updated, or easily read by a machine. The tool supports plugin functionality so additional tests can be added easily. S 0.5 One can add a test as a plugin, it can however be a bit too technical still for many people. The tool allows to create custom reports based on components. S 1 One can slide the technical tests results and the assessment test results into a report which will be placed into a PDF It is possible to add custom components for reports. S 1 It is possible, but just like with tests can be hard for non-technical people The tool provides detailed logging, including tracking of different model versions, changes in impact assessments, and technical test results for individual runs. S 0.5 There are versions of models when uploaded, and the report itself is the technical test result of a run. Changes to impact assessments are not logged (only when a report is generated) The tool supports saving progress. S 1 Reports can be saved, while it is being constructed The tool can be used on an isolated system without an internet connection. S 1 Locally the docker container can be build and ran The tool offers options to discuss and document conversations. For example, to converse about technical tests or to collaborate on impact assessments. C 0 Only the end-result will be logged into the report The tool operates with complete data privacy; it does not share any data or logging information. C 1 The application is a docker application and does not do this The tool allows extension of report formats functionality. C 1 We could program this functionality in the tool and submit a PR The tool can be integrated in a CI/CD flow. C 0.5 It is possible, but would be very heavy to do so. The build time is quite large, and only the technical tests could be ran in an automated fashion The tool can be offered as a (cloud) service where no local installation is required. C 0 AIVerify is currently not doing this, we could however offer it as a cloud service It is possible to define and automate workflows for repetitive tasks. C 0 As this tool is focused on UI, this is not possible The tool offers pre-built connectors or low-code/no-code integration options to simplify the integration process. C 0 This is not includedtotal_score = 36
"},{"location":"projects/tad/existing-tools/checklists/aiverify_checklist/#reliability","title":"Reliability","text":"Requirement Priority Fulfilled Comments The tool operates consistently and reliably, meaning it delivers the same expected results every time you use it. M 1 The tool did not break down a single time while we were coding a plugin (only threw errors) The tool recovers automatically from common failures. S 1 Common failures like missing datasets or models are not breaking The tool recovers from failures quickly, minimizing data loss, for example by automatically saving intermediate test progress results. S 0.5 The assessments you need to manually save otherwise it will be lost, but over different sessions the data will be stored persistent even if the containers go down. Test results are only stored in the generated report The tool handles errors gracefully and informs users of any issues. S 1 When failed to generate a report the tool will log the error messages, otherwise when loading in data that is non existing the application (while not being very clear in error message) just continues with an error The tool provides clear error messages and instructions for troubleshooting. S 0.5 The test-engine-core is a dependency that is installed as a package, and therefore the error message will not contain error in that packagetotal_score = 13
"},{"location":"projects/tad/existing-tools/checklists/aiverify_checklist/#usability","title":"Usability","text":"Requirement Priority Fulfilled Comments The tool possess a clean, intuitive, and visually appealing UI that follows industry standards. S 1 The tool does follow the material design principles for example when you hover over items they will respond to user input The tool provides clear and consistent navigation, making it easy for users to find what they need. S 0.5 It is not completely clear where in the tool you are when interacting with it and sometimes you could go back to home but not always The tool is responsive and provides instant feedback. S 1 Even for jobs like generating tests and the report, it scheduled jobs and will notify you when it is done The user interface is multilingual and supports at least English. S 0.5 Currently it only supports english The tool offers keyboard shortcuts for efficient interaction. C 0 It is mainly UI and therefore no keyboard shortcuts The user interface can easily be translated into other languages. C 0.2 It would need quite some refactoring when adding support for the Dutch Language (especially the more technical words like Warning or the metadata on all the pluginstotal_score = 9.4
"},{"location":"projects/tad/existing-tools/checklists/aiverify_checklist/#help-documentation","title":"Help & Documentation","text":"Requirement Priority Fulfilled Comments The tool provides comprehensive online help documentation with searchable functionalities. S 0.8 From the end-user perspective yes, from the development perspective no (for example that you need to rebuild packages like the test-engine-core The tool offers context-sensitive help within the application. C 0 Not included in the tool The online documentation includes video tutorials and training materials for ease of learning. C 0 Although it contains many images The project provides readily available customer support through various channels (e.g., email, phone, online chat) to address user inquiries and troubleshoot issues. C 0.2 Just email, which they do not respond to very quicklytotal_score = 2.8
"},{"location":"projects/tad/existing-tools/checklists/aiverify_checklist/#performance-efficiency","title":"Performance Efficiency","text":"Requirement Priority Fulfilled Comments The tool operates efficiently and minimize resource utilization. M 0.5 The tool is efficient, minimal waiting and no lag although it uses up quite some resources which could be optimized The tool responds to user actions instantly. M 1 Instantaneous response time The tool is scalable to accommodate increased user base and data volume. S 0.5 As it is built into a container it can be made scalable with Kubernetes, but the the tool itself can become very slow when generating results for a large dataset and model (because of the extra overhead)total_score = 7.5
"},{"location":"projects/tad/existing-tools/checklists/aiverify_checklist/#maintainability","title":"Maintainability","text":"Requirement Priority Fulfilled Comments The tool is easy to modify and maintain. M 0.2 Adding a new plugin for a model type was quite hard, other plugins however are more easier The tool adheres to industry coding standards and best practices to ensure code quality and maintainability. M 0.2 The docker side of the project could have a big improvement The code is written in a common, widely adopted and supported and actively used and maintained programming language. M 1 Backend in Python, Frontend in NextJs The project provides version control for code changes and rollback capabilities. M 0.8 The code is stored on Github, but the container itself not and also the packages which the tools depend on not The project is open source. M 1 Github link It is possible to contribute to the source. S 0.5 It is possible, although with our three features it takes a while for them to dedicated time for integration The system is modular, allowing for easy modification of individual components. S 0.5 The technical tests and assessments are easy to adjust, other core features not Diagnostic tools are available to identify and troubleshoot issues. S 0 Diagnosing some parts of the system took us quite some time as we couldn't properly debug in the containerstotal_score = 15.8
"},{"location":"projects/tad/existing-tools/checklists/aiverify_checklist/#security","title":"Security","text":"Requirement Priority Fulfilled Comments The tool must protect data and system from unauthorized access, use, disclosure, disruption, modification, or destruction. M 0.5 This managed by that the data is stored in MongoDB however, it currently only has 1 user support Regular security audits and penetration testing are conducted. S 0.1 We are unaware of the security audits but they do have a security policy here The tool enforce authorization controls based on user roles and permissions, restricting access to sensitive data and functionalities. C 0 Currently only 1 user can use the system and see all the data Data encryption is used for sensitive information at rest and in transit. C 1 When data is transferred to mongoDB, a secure connection is set-up and also in the DB it is encrypted by MongoDB, also you have an SSL connection with the tool The project allows for regular security audits and penetration testing to identify vulnerabilities and ensure system integrity. C 1 As you can install it locally, this is possible The tool implements backup functionality to ensure data availability in case of incidents. C 1 Data is stored persistent, so even if the tool breaks the data will be in volumestotal_score = 8.3
"},{"location":"projects/tad/existing-tools/checklists/aiverify_checklist/#compatibility","title":"Compatibility","text":"Requirement Priority Fulfilled Comments The tool is compatible with existing systems and infrastructure. M 1 As it is a container it can run on Kubernetes and therefore at Digilab The tool supports industry-standard data formats and protocols. M 1 Most Datasets and Models are supported by the tool The tool operates seamlessly on supported operating systems and hardware platforms. S 1 As it runs in a container it is able to run on all the major OS'es if you have Docker Desktop or use a cloud version managed by yourself The tool supports commonly used data formats (e.g., CSV, Excel, JSON) for easy data exchange with other systems and tools. S 0.5 As input many types are accepted, but only as export there is a PDF report The tool integrates with existing security solutions. C 0 It does not integrate with security solutionstotal_score = 12.5
"},{"location":"projects/tad/existing-tools/checklists/aiverify_checklist/#accessibility","title":"Accessibility","text":"Requirement Priority Fulfilled Comments The tool is accessible to users with disabilities, following relevant accessibility standards (e.g., WCAG). S 0 It is not clear what the tool actually does with one look, also the color change when hovering over elements is not a large difference compared to the original color (the purple and pink)total_score = 0
"},{"location":"projects/tad/existing-tools/checklists/aiverify_checklist/#portability","title":"Portability","text":"Requirement Priority Fulfilled Comments The tool support a range of operating systems (e.g., Windows, macOS, Linux) commonly used within an organization. S 1 It is containerized The tool minimizes dependencies on specific hardware or software configurations, promoting flexibility. S 1 This is all containerized The tool offers a cloud-based deployment option or be compatible with cloud environments for scalability and accessibility. S 1 As it is containerized we could host this ourselves in a cloud environment The tool adheres to relevant cloud security standards and best practices. S 0.5 The making of the container it self is lacking some best practices, otherwise the cloud security standards are not applicable as it is a self-hosted tooltotal_score = 10.5
"},{"location":"projects/tad/existing-tools/checklists/aiverify_checklist/#deployment","title":"Deployment","text":"Requirement Priority Fulfilled Comments The tool has an easy and user-friendly installation and configuration process. S 0.5 You need to be technical to be able to install and deploy, but then it is relatively easy The tool has on-premise or cloud-based deployment options to cater to different organizational needs and infrastructure. S 0 The tool does not promise on-prem or cloud-based managed deploymentstotal_score = 1.5
"},{"location":"projects/tad/existing-tools/checklists/aiverify_checklist/#legal-compliance","title":"Legal & Compliance","text":"Requirement Priority Fulfilled Comments It is clear how the tool is funded to avoid improper influence due to conflicts of interest M 1 On the website it is stated, that many commercial partners fund this project The tool is compliant with relevant legal and regulatory requirements. S 1 The tool adheres to (local) data privacy regulations like GDPR, ensuring the protection of user data. S 1 The tool implements appropriate security measures to comply with industry regulations and standards. S 1 The tool is licensed for use within the organization according to the terms and conditions of the license agreement. S 1 Apache 2.0 license The tool respects intellectual property rights and avoid copyright infringement issues. S 1total_score = 19
"},{"location":"projects/tad/existing-tools/checklists/holisticai_checklist/","title":"Holistic AI","text":"See the introduction. It is a toolkit just like IBM-360-Toolkit for a data scientist to research bias and also to mitigate it immediately.
"},{"location":"projects/tad/existing-tools/checklists/holisticai_checklist/#functionality","title":"Functionality","text":"Requirement Priority Fulfilled Comments The tool allows users to conduct technical tests on algorithms or models, including assessments of performance, bias, and fairness. To facilitate these tests, users can input relevant datasets, M 1 The tests which can be executed are written here The tool allows users to choose which tests to perform. M 1 In code the user is free to choose any test The tool allows users to fill out questionnaires to conduct impact assessments for AI. For example IAMA or ALTAI. M 0 The tool only does technical tests The tool can generate a human readable report. M 0 The toolkit itself cannot make a human readable report, it only generates results which then needs to be interpreted The tools works with a standardized report format, that it can read, write, and update. M 0 The only format it outputs are specific numbers, so no standardized format or even een report format The tool supports plugin functionality so additional tests can be added easily. S 0 All the bias tests are put in a single script which making additional tests a bit cumbersome and leas developer-friendly The tool allows to create custom reports based on components. S 0 Does not allow reports export It is possible to add custom components for reports. S 0 Does not allow reports export The tool provides detailed logging, including tracking of different model versions, changes in impact assessments, and technical test results for individual runs. S 0 Not ouf of the box, but this could be written in code by the owner of the algorithm The tool supports saving progress. S 0 Not ouf of the box, but this could be written in code by the owner of the algorithm The tool can be used on an isolated system without an internet connection. S 1 As a python tool this is possible The tool offers options to discuss and document conversations. For example, to converse about technical tests or to collaborate on impact assessments. C 0 This is not supported The tool operates with complete data privacy; it does not share any data or logging information. C 1 The local tool does not share anything to the outside world The tool allows extension of report formats functionality. C 0 This is not what the tool is built for The tool can be integrated in a CI/CD flow. C 1 As it is a python package it can be included in a CI pipeline The tool can be offered as a (cloud) service where no local installation is required. C 0 Not immediately, an UI needs to be build around it It is possible to define and automate workflows for repetitive tasks. C 1 Automated tests could be programmed specifically from this tool The tool offers pre-built connectors or low-code/no-code integration options to simplify the integration process. C 0 Not supported by the tooltotal_score = 17
"},{"location":"projects/tad/existing-tools/checklists/holisticai_checklist/#reliability","title":"Reliability","text":"Requirement Priority Fulfilled Comments The tool operates consistently and reliably, meaning it delivers the same expected results every time you use it. M 1 The tool recovers automatically from common failures. S 1 The tool recovers from failures quickly, minimizing data loss, for example by automatically saving intermediate test progress results. S 1 The tool handles errors gracefully and informs users of any issues. S 1 The tool provides clear error messages and instructions for troubleshooting. S 1total_score = 16
"},{"location":"projects/tad/existing-tools/checklists/holisticai_checklist/#usability","title":"Usability","text":"Requirement Priority Fulfilled Comments The tool possess a clean, intuitive, and visually appealing UI that follows industry standards. S 0 There is no user-interface The tool provides clear and consistent navigation, making it easy for users to find what they need. S 0 There is no user-interface The tool is responsive and provides instant feedback. S 0 There is no user-interface The user interface is multilingual and supports at least English. S 0 There is no user-interface The tool offers keyboard shortcuts for efficient interaction. C 0 There is no user-interface The user interface can easily be translated into other languages. C 0 There is no user-interfacetotal_score = 0
"},{"location":"projects/tad/existing-tools/checklists/holisticai_checklist/#help-documentation","title":"Help & Documentation","text":"Requirement Priority Fulfilled Comments The tool provides comprehensive online help documentation with searchable functionalities. S 0.2 There is some documentation but it is not very helpful The tool offers context-sensitive help within the application. C 0 As a Python tool, no The online documentation includes video tutorials and training materials for ease of learning. C 0 Ths is not there The project provides readily available customer support through various channels (e.g., email, phone, online chat) to address user inquiries and troubleshoot issues. C 0.5 You can contact sales through their website and respond on Github, Github seems to be an okay response time (but not a large community)total_score = 1.6
"},{"location":"projects/tad/existing-tools/checklists/holisticai_checklist/#performance-efficiency","title":"Performance Efficiency","text":"Requirement Priority Fulfilled Comments The tool operates efficiently and minimize resource utilization. M 1 very lightweight as a python package The tool responds to user actions instantly. M 1 It will return output instantly The tool is scalable to accommodate increased user base and data volume. S 1 This would be installed distributed and therefore would be scalable, with large datasets it is still very quicktotal_score = 11
"},{"location":"projects/tad/existing-tools/checklists/holisticai_checklist/#maintainability","title":"Maintainability","text":"Requirement Priority Fulfilled Comments The tool is easy to modify and maintain. M 0.5 It is less modular because most of the tests are written in a single script The tool adheres to industry coding standards and best practices to ensure code quality and maintainability. M 0.5 They use pre-commit hooks, but the codebase seems to be a bit weirdly structured The code is written in a common, widely adopted and supported and actively used and maintained programming language. M 1 It is written in Python The project provides version control for code changes and rollback capabilities. M 1 It is hosted on Github The project is open source. M 1 Hosted here It is possible to contribute to the source. S 1 It is possible and they respond to contributions The system is modular, allowing for easy modification of individual components. S 0.5 See the first point Diagnostic tools are available to identify and troubleshoot issues. S 1 Just standard python troubleshooting toolstotal_score = 23.5
"},{"location":"projects/tad/existing-tools/checklists/holisticai_checklist/#security","title":"Security","text":"Requirement Priority Fulfilled Comments The tool must protect data and system from unauthorized access, use, disclosure, disruption, modification, or destruction. M 0 Not applicable Regular security audits and penetration testing are conducted. S 0 It is not stated on the repository that they do something with security The tool enforce authorization controls based on user roles and permissions, restricting access to sensitive data and functionalities. C 0 The tool does not have Users or Access control Data encryption is used for sensitive information at rest and in transit. C 0 Transitionary data is not stored The project allows for regular security audits and penetration testing to identify vulnerabilities and ensure system integrity. C 1 This is not blocked by the tool The tool implements backup functionality to ensure data availability in case of incidents. C 0 Not supportedtotal_score = 2
"},{"location":"projects/tad/existing-tools/checklists/holisticai_checklist/#compatibility","title":"Compatibility","text":"Requirement Priority Fulfilled Comments The tool is compatible with existing systems and infrastructure. M 1 It can be imported in Python The tool supports industry-standard data formats and protocols. M 0 it does not standardize at all in the output of the tests The tool operates seamlessly on supported operating systems and hardware platforms. S 1 Python can be ran on any system The tool supports commonly used data formats (e.g., CSV, Excel, JSON) for easy data exchange with other systems and tools. S 1 If it can be imported in Python/R it is supported The tool integrates with existing security solutions. C 0 Not applicabletotal_score = 10
"},{"location":"projects/tad/existing-tools/checklists/holisticai_checklist/#accessibility","title":"Accessibility","text":"Requirement Priority Fulfilled Comments The tool is accessible to users with disabilities, following relevant accessibility standards (e.g., WCAG). S 0 You need to be a programmer to use it, and that is not your typical user with disabilitiestotal_score = 0
"},{"location":"projects/tad/existing-tools/checklists/holisticai_checklist/#portability","title":"Portability","text":"Requirement Priority Fulfilled Comments The tool support a range of operating systems (e.g., Windows, macOS, Linux) commonly used within an organization. S 0.5 As it is a python tool it is supported anywhere python runs The tool minimizes dependencies on specific hardware or software configurations, promoting flexibility. S 1 It is a python tool The tool offers a cloud-based deployment option or be compatible with cloud environments for scalability and accessibility. S 1 The company behind Holistic AI offers a whole range of services included an UI which uses this open-source toolkit The tool adheres to relevant cloud security standards and best practices. S 0 On their website they do not speak about where the data of their solution will go, this is not very transparenttotal_score = 7.5
"},{"location":"projects/tad/existing-tools/checklists/holisticai_checklist/#deployment","title":"Deployment","text":"Requirement Priority Fulfilled Comments The tool has an easy and user-friendly installation and configuration process. S 0.2 You need to have some developer knowledge and also knowledge about the technical tests to use The tool has on-premise or cloud-based deployment options to cater to different organizational needs and infrastructure. S 1 Yes the tool can be used as a cloud-based deployment but then with a whole UI around ittotal_score = 3.6
"},{"location":"projects/tad/existing-tools/checklists/holisticai_checklist/#legal-compliance","title":"Legal & Compliance","text":"Requirement Priority Fulfilled Comments It is clear how the tool is funded to avoid improper influence due to conflicts of interest M 1 The tool is owned by a private company but has been made open source to the public The tool is compliant with relevant legal and regulatory requirements. S 1 Under the apache 2.0 license The tool adheres to (local) data privacy regulations like GDPR, ensuring the protection of user data. S 1 Data stays locally The tool implements appropriate security measures to comply with industry regulations and standards. S 0 The repo does not speak about security at all The tool is licensed for use within the organization according to the terms and conditions of the license agreement. S 1 Under the apache 2.0 license The tool respects intellectual property rights and avoid copyright infringement issues. S 1total_score = 16
"},{"location":"projects/tad/existing-tools/checklists/ibm_360_research_toolkit_checklist/","title":"IBM Research 360 Toolkit","text":"See the introduction, same thing as verifyML this has no frontend baked in, but has some nice integrations with MLops tooling like Kubeflow Pipelines. The IBM Research 360 toolkit is actually a collection of three open-source toolkits as stated by their Github repo; AI Fairness 360, AI Explainability 360, Adversarial Robustness 360. The strong suite of this toolkit that it considers bias in the whole lifecycle of the model; (dataset, training, output).
"},{"location":"projects/tad/existing-tools/checklists/ibm_360_research_toolkit_checklist/#functionality","title":"Functionality","text":"Requirement Priority Fulfilled Comments The tool allows users to conduct technical tests on algorithms or models, including assessments of performance, bias, and fairness. To facilitate these tests, users can input relevant datasets, M 1 Fairness, Explainability and security can be tested with the suite of tools The tool allows users to choose which tests to perform. M 1 The websites of contain a whole explanation of which tests to perform AIF Website, AIX website, ART website The tool allows users to fill out questionnaires to conduct impact assessments for AI. For example IAMA or ALTAI. M 0 The tool only does technical tests The tool can generate a human readable report. M 0 The toolkit itself cannot make a human readable report, it only generates results which then needs to be interpreted The tools works with a standardized report format, that it can read, write, and update. M 0 The only format it outputs are specific numbers, so no standardized format or even een report format The tool supports plugin functionality so additional tests can be added easily. S 1 Only the repository new tests could be added quite easily if you understand Python The tool allows to create custom reports based on components. S 0 The tool does not generate reports It is possible to add custom components for reports. S 0 The tool does not generate reports The tool provides detailed logging, including tracking of different model versions, changes in impact assessments, and technical test results for individual runs. S 0 Not ouf of the box, but this could be written in code by the owner of the algorithm The tool supports saving progress. S 0 Not ouf of the box, but this could be written in code by the owner of the algorithm The tool can be used on an isolated system without an internet connection. S 1 As it can be imported as a python or r library The tool offers options to discuss and document conversations. For example, to converse about technical tests or to collaborate on impact assessments. C 0 This is not supported, there is no UI The tool operates with complete data privacy; it does not share any data or logging information. C 1 The tool does not share data The tool allows extension of report formats functionality. C 0 The tool does not generate reports The tool can be integrated in a CI/CD flow. C 1 As it is a programming toolkit it can be used in a CI/CD pipeline The tool can be offered as a (cloud) service where no local installation is required. C 0 not immediately, then an UI needs to be made It is possible to define and automate workflows for repetitive tasks. C 1 We could automate specific tests which we deem necessary or standard The tool offers pre-built connectors or low-code/no-code integration options to simplify the integration process. C 0 Purely written in Pythontotal_score = 20
"},{"location":"projects/tad/existing-tools/checklists/ibm_360_research_toolkit_checklist/#reliability","title":"Reliability","text":"Requirement Priority Fulfilled Comments The tool operates consistently and reliably, meaning it delivers the same expected results every time you use it. M 1 The tool recovers automatically from common failures. S 1 The tool recovers from failures quickly, minimizing data loss, for example by automatically saving intermediate test progress results. S 1 The tool handles errors gracefully and informs users of any issues. S 1 The tool provides clear error messages and instructions for troubleshooting. S 1total_score = 16
"},{"location":"projects/tad/existing-tools/checklists/ibm_360_research_toolkit_checklist/#usability","title":"Usability","text":"Requirement Priority Fulfilled Comments The tool possess a clean, intuitive, and visually appealing UI that follows industry standards. S 0 There is no user-interface The tool provides clear and consistent navigation, making it easy for users to find what they need. S 0 There is no user-interface The tool is responsive and provides instant feedback. S 0 There is no user-interface The user interface is multilingual and supports at least English. S 0 There is no user-interface The tool offers keyboard shortcuts for efficient interaction. C 0 There is no user-interface The user interface can easily be translated into other languages. C 0 There is no user-interfacetotal_score = 0
"},{"location":"projects/tad/existing-tools/checklists/ibm_360_research_toolkit_checklist/#help-documentation","title":"Help & Documentation","text":"Requirement Priority Fulfilled Comments The tool provides comprehensive online help documentation with searchable functionalities. S 0.8 On the website of the specific toolkit you can find many docs but you cannot search The tool offers context-sensitive help within the application. C 0 Within the application (as it is not an UI, does not offer specific help) The online documentation includes video tutorials and training materials for ease of learning. C 1 The amount of tutorials is extensive even videos of its usage The project provides readily available customer support through various channels (e.g., email, phone, online chat) to address user inquiries and troubleshoot issues. C 1 You can ask questions at the repository, but also in slack and many people are using thistotal_score = 6.4
"},{"location":"projects/tad/existing-tools/checklists/ibm_360_research_toolkit_checklist/#performance-efficiency","title":"Performance Efficiency","text":"Requirement Priority Fulfilled Comments The tool operates efficiently and minimize resource utilization. M 1 very lightweight as a python package The tool responds to user actions instantly. M 1 It will return output instantly The tool is scalable to accommodate increased user base and data volume. S 1 This would be installed distributed and therefore would be scalable, with large datasets it is still very quicktotal_score = 11
"},{"location":"projects/tad/existing-tools/checklists/ibm_360_research_toolkit_checklist/#maintainability","title":"Maintainability","text":"Requirement Priority Fulfilled Comments The tool is easy to modify and maintain. M 1 The repositories are very well structured and therefore easy to adjust The tool adheres to industry coding standards and best practices to ensure code quality and maintainability. M 1 Although it doesn't have pre-commit hooks it does have a CONTRIBUTING.rst where the rules of good practices are written down The code is written in a common, widely adopted and supported and actively used and maintained programming language. M 1 It is written in Python The project provides version control for code changes and rollback capabilities. M 1 The code is hosted on Github The project is open source. M 1 At the beginning of this doc you can find the links to the repositories It is possible to contribute to the source. S 1 They have merged many outside requests, so this is fine The system is modular, allowing for easy modification of individual components. S 1 Tests can very easily be added if you understand Python Diagnostic tools are available to identify and troubleshoot issues. S 1 Just standard python troubleshooting toolstotal_score = 29
"},{"location":"projects/tad/existing-tools/checklists/ibm_360_research_toolkit_checklist/#security","title":"Security","text":"Requirement Priority Fulfilled Comments The tool must protect data and system from unauthorized access, use, disclosure, disruption, modification, or destruction. M 0 not applicable Regular security audits and penetration testing are conducted. S 0 It is not stated on the repository that they do something with security The tool enforce authorization controls based on user roles and permissions, restricting access to sensitive data and functionalities. C 0 The tool does not have Users or Access control Data encryption is used for sensitive information at rest and in transit. C 0 Transitionary data is not stored The project allows for regular security audits and penetration testing to identify vulnerabilities and ensure system integrity. C 1 This is not blocked by the tool The tool implements backup functionality to ensure data availability in case of incidents. C 0 Not supportedtotal_score = 2
"},{"location":"projects/tad/existing-tools/checklists/ibm_360_research_toolkit_checklist/#compatibility","title":"Compatibility","text":"Requirement Priority Fulfilled Comments The tool is compatible with existing systems and infrastructure. M 1 It can easily be imported in Python or R The tool supports industry-standard data formats and protocols. M 0.5 It does not standardize really on any output from the tests The tool operates seamlessly on supported operating systems and hardware platforms. S 1 As a python and R tool it can be run on systems where these can be ran The tool supports commonly used data formats (e.g., CSV, Excel, JSON) for easy data exchange with other systems and tools. S 1 These can be used if they are imported in python and R The tool integrates with existing security solutions. C 1 The Adversarial Robustness Toolbox can be used to test for the security of AI Systemstotal_score = 14
"},{"location":"projects/tad/existing-tools/checklists/ibm_360_research_toolkit_checklist/#accessibility","title":"Accessibility","text":"Requirement Priority Fulfilled Comments The tool is accessible to users with disabilities, following relevant accessibility standards (e.g., WCAG). S 0 You need to be a programmer to use it, and that is not your typical user with disabilitiestotal_score = 0
"},{"location":"projects/tad/existing-tools/checklists/ibm_360_research_toolkit_checklist/#portability","title":"Portability","text":"Requirement Priority Fulfilled Comments The tool support a range of operating systems (e.g., Windows, macOS, Linux) commonly used within an organization. S 0.7 If you can run python, which is not always possible within the government for example, but R could be more easy to be run on places The tool minimizes dependencies on specific hardware or software configurations, promoting flexibility. S 1 Just a python tool, no UI which is fairly minimal The tool offers a cloud-based deployment option or be compatible with cloud environments for scalability and accessibility. S 0 It is not offered as a cloud-based option The tool adheres to relevant cloud security standards and best practices. S 0 Not relevanttotal_score = 5.1
"},{"location":"projects/tad/existing-tools/checklists/ibm_360_research_toolkit_checklist/#deployment","title":"Deployment","text":"Requirement Priority Fulfilled Comments The tool has an easy and user-friendly installation and configuration process. S 0.4 You need to have some developer knowledge and also knowledge about the technical tests to use. But then it is quite easy and works fairly quickly The tool has on-premise or cloud-based deployment options to cater to different organizational needs and infrastructure. S 0 Not applicabletotal_score = 1.2
"},{"location":"projects/tad/existing-tools/checklists/ibm_360_research_toolkit_checklist/#legal-compliance","title":"Legal & Compliance","text":"Requirement Priority Fulfilled Comments It is clear how the tool is funded to avoid improper influence due to conflicts of interest M 1 The tool was from IBM, but slowly they are removing the IBM branding from this and the tool is now owned by the LF AI Foundation (where big companies are part of) The tool is compliant with relevant legal and regulatory requirements. S 1 All three tools have apache 2.0 license The tool adheres to (local) data privacy regulations like GDPR, ensuring the protection of user data. S 1 Data will stay local The tool implements appropriate security measures to comply with industry regulations and standards. S 0 Nothing is known about the security measures of the toolkits The tool is licensed for use within the organization according to the terms and conditions of the license agreement. S 1 All three tools have apache 2.0 license The tool respects intellectual property rights and avoid copyright infringement issues. S 1 The specific tests are implementations of papers which are open for everyonetotal_score = 16
"},{"location":"projects/tad/existing-tools/checklists/verifyml_checklist/","title":"VerifyML","text":"See the introduction, the maker also suggests to use an front-end tool to collaboratively change the model card. Model Card Editor this is not open-source and also the developer suggests in this issue to not use this tool but to use tools like AIVerify. This checklist only looks at the verifyML python toolkit and not the web interface.
"},{"location":"projects/tad/existing-tools/checklists/verifyml_checklist/#functionality","title":"Functionality","text":"Requirement Priority Fulfilled Comments The tool allows users to conduct technical tests on algorithms or models, including assessments of performance, bias, and fairness. To facilitate these tests, users can input relevant datasets, M 1 The tool does allow a few standardized tests, specified here The tool allows users to choose which tests to perform. M 1 In code the user is free to choose any test The tool allows users to fill out questionnaires to conduct impact assessments for AI. For example IAMA or ALTAI. M 0 The tool can generate a human readable report. M 1 The tool can visualize model cards that are generated by it The tools works with a standardized report format, that it can read, write, and update. M 1 It generates html which can be imported by a machine The tool supports plugin functionality so additional tests can be added easily. S 1 Any test can be ran by the user itself and the output imported in the model card generated by the tool The tool allows to create custom reports based on components. S 0 It doesn't offer any standardization in what to put in the report It is possible to add custom components for reports. S 1 Anything can be put in the model card, which makes it very flexible The tool provides detailed logging, including tracking of different model versions, changes in impact assessments, and technical test results for individual runs. S 0 Not ouf of the box, but this could be written in code by the owner of the algorithm The tool supports saving progress. S 1 Once the modelcard is generated it could be loaded in again and be changed The tool can be used on an isolated system without an internet connection. S 1 Once the tool is imported in python it can be used without an internet connection The tool offers options to discuss and document conversations. For example, to converse about technical tests or to collaborate on impact assessments. C 0 Assessments are not supported The tool operates with complete data privacy; it does not share any data or logging information. C 1 It does not do this The tool allows extension of report formats functionality. C 1 As it exports html, it can also be transferred to json or markdown The tool can be integrated in a CI/CD flow. C 1 The automated tests could be ran in the CI/CD tool to generated a model card The tool can be offered as a (cloud) service where no local installation is required. C 0 The python tool itself not, but a frontend which needs to be developed yes It is possible to define and automate workflows for repetitive tasks. C 1 As it is written in python this can be automated easily The tool offers pre-built connectors or low-code/no-code integration options to simplify the integration process. C 0 The tool does this nottotal_score = 42
"},{"location":"projects/tad/existing-tools/checklists/verifyml_checklist/#reliability","title":"Reliability","text":"Requirement Priority Fulfilled Comments The tool operates consistently and reliably, meaning it delivers the same expected results every time you use it. M 1 Once you have located the right (older) libraries it runs pretty smoothly and reliably The tool recovers automatically from common failures. S 0 Library dependencies needs to be solved by yourself as this is not handled by the tool (especially graphs) The tool recovers from failures quickly, minimizing data loss, for example by automatically saving intermediate test progress results. S 0 It does not store any intermediary results The tool handles errors gracefully and informs users of any issues. S 0 It just breaks, you need to explicitly export the model card for it to saved The tool provides clear error messages and instructions for troubleshooting. S 0 The error messages are python error messages unrelated to the tooltotal_score = 4
"},{"location":"projects/tad/existing-tools/checklists/verifyml_checklist/#usability","title":"Usability","text":"Requirement Priority Fulfilled Comments The tool possess a clean, intuitive, and visually appealing UI that follows industry standards. S 0 There is no user interface The tool provides clear and consistent navigation, making it easy for users to find what they need. S 0 There is no user interface The tool is responsive and provides instant feedback. S 0 There is no user interface The user interface is multilingual and supports at least English. S 0 There is no user interface The tool offers keyboard shortcuts for efficient interaction. C 0 There is no user interface The user interface can easily be translated into other languages. C 0 There is no user interfacetotal_score = 0
"},{"location":"projects/tad/existing-tools/checklists/verifyml_checklist/#help-documentation","title":"Help & Documentation","text":"Requirement Priority Fulfilled Comments The tool provides comprehensive online help documentation with searchable functionalities. S 0.5 The documentation is quite concise and helpful, but it is outdated The tool offers context-sensitive help within the application. C 0 No context info whatsoever The online documentation includes video tutorials and training materials for ease of learning. C 0 Just documentation The project provides readily available customer support through various channels (e.g., email, phone, online chat) to address user inquiries and troubleshoot issues. C 0 The people who worked on the tool are quick to respond to issues, but they don't support the tool anymoretotal_score = 1.5
"},{"location":"projects/tad/existing-tools/checklists/verifyml_checklist/#performance-efficiency","title":"Performance Efficiency","text":"Requirement Priority Fulfilled Comments The tool operates efficiently and minimize resource utilization. M 1 Very lightweight tool, as it is a python package The tool responds to user actions instantly. M 1 When run, it returns instantly The tool is scalable to accommodate increased user base and data volume. S 1 This would be installed distributed and therefore would be scalable, with large datasets it is still very quicktotal_score = 11
"},{"location":"projects/tad/existing-tools/checklists/verifyml_checklist/#maintainability","title":"Maintainability","text":"Requirement Priority Fulfilled Comments The tool is easy to modify and maintain. M 1 The tool itself it not so large and written with tools we are all quite aware of The tool adheres to industry coding standards and best practices to ensure code quality and maintainability. M 1 The repository has poetry, pre-commit hooks, has a CI, and looks well structured The code is written in a common, widely adopted and supported and actively used and maintained programming language. M 1 in Python and jupyter notebooks The project provides version control for code changes and rollback capabilities. M 1 It is hosted on Github The project is open source. M 1 Apache 2.0 license It is possible to contribute to the source. S 0 The project is not active supported anymore, so we would need to make a fork and make that the main source The system is modular, allowing for easy modification of individual components. S 0.5 The idea of a model card is pretty modular, and can be changed any way we like. Adding assessments in the tool would be quite the effort Diagnostic tools are available to identify and troubleshoot issues. S 1 Just standard python troubleshooting toolstotal_score = 24.5
"},{"location":"projects/tad/existing-tools/checklists/verifyml_checklist/#security","title":"Security","text":"Requirement Priority Fulfilled Comments The tool must protect data and system from unauthorized access, use, disclosure, disruption, modification, or destruction. M 0 not applicable Regular security audits and penetration testing are conducted. S 0 As the tool is not actively maintained anymore The tool enforce authorization controls based on user roles and permissions, restricting access to sensitive data and functionalities. C 0 As this is a local import only, this is managed by the developer Data encryption is used for sensitive information at rest and in transit. C 0 Intermediary data is not stored, and the end result is put in html with no encryption The project allows for regular security audits and penetration testing to identify vulnerabilities and ensure system integrity. C 1 It does not block this for users to do this The tool implements backup functionality to ensure data availability in case of incidents. C 0 Not supportedtotal_score = 2
"},{"location":"projects/tad/existing-tools/checklists/verifyml_checklist/#compatibility","title":"Compatibility","text":"Requirement Priority Fulfilled Comments The tool is compatible with existing systems and infrastructure. M 1 It can be easily imported and installed in python The tool supports industry-standard data formats and protocols. M 1 Standardized tests are used and the output format is html The tool operates seamlessly on supported operating systems and hardware platforms. S 1 As it is a python tool, anywhere where python can run this can also be run The tool supports commonly used data formats (e.g., CSV, Excel, JSON) for easy data exchange with other systems and tools. S 1 This can be imported The tool integrates with existing security solutions. C 0 It does not do such a thingtotal_score = 14
"},{"location":"projects/tad/existing-tools/checklists/verifyml_checklist/#accessibility","title":"Accessibility","text":"Requirement Priority Fulfilled Comments The tool is accessible to users with disabilities, following relevant accessibility standards (e.g., WCAG). S 0 You need to be a programmer to use it, and that is not your typical user with disabilitiestotal_score = 0
"},{"location":"projects/tad/existing-tools/checklists/verifyml_checklist/#portability","title":"Portability","text":"Requirement Priority Fulfilled Comments The tool support a range of operating systems (e.g., Windows, macOS, Linux) commonly used within an organization. S 0.5 If you can run python, which is not always possible within the government for example The tool minimizes dependencies on specific hardware or software configurations, promoting flexibility. S 1 As it is a python tool The tool offers a cloud-based deployment option or be compatible with cloud environments for scalability and accessibility. S 0 It is not offered as a cloud-based option The tool adheres to relevant cloud security standards and best practices. S 0 On the github nothing is mentioned about security and for the cloud version it is not applicabletotal_score = 4.5
"},{"location":"projects/tad/existing-tools/checklists/verifyml_checklist/#deployment","title":"Deployment","text":"Requirement Priority Fulfilled Comments The tool has an easy and user-friendly installation and configuration process. S 0.2 You need to have some developer knowledge and also knowledge about the technical tests to use The tool has on-premise or cloud-based deployment options to cater to different organizational needs and infrastructure. S 0 Not applicabletotal_score = 0.6
"},{"location":"projects/tad/existing-tools/checklists/verifyml_checklist/#legal-compliance","title":"Legal & Compliance","text":"Requirement Priority Fulfilled Comments It is clear how the tool is funded to avoid improper influence due to conflicts of interest M 1 It was developed during a competition and it does not receive funding anymore The tool is compliant with relevant legal and regulatory requirements. S 1 Under the apache 2.0 license The tool adheres to (local) data privacy regulations like GDPR, ensuring the protection of user data. S 1 Data will stay local The tool implements appropriate security measures to comply with industry regulations and standards. S 0 The repo does not speak about security at all The tool is licensed for use within the organization according to the terms and conditions of the license agreement. S 1 Under the apache 2.0 license The tool respects intellectual property rights and avoid copyright infringement issues. S 1total_score = 16
"},{"location":"projects/tad/existing-tools/comparison/requirements/","title":"Requirements for tools for Transparency of Algorithmic Decision making","text":"This document contains a checklist with requirements for tools we could use to help with the transparency of algorithmic decision making.
The requirements are based on:
The requirements have been given a priority based on the MoSCoW scale to allow for tool comparison.
"},{"location":"projects/tad/existing-tools/comparison/requirements/#functionality","title":"Functionality","text":"Requirement Priority The tool allows users to conduct technical tests on algorithms or models, including assessments of performance, bias, and fairness. To facilitate these tests, users can input relevant datasets, M The tool allows users to choose which tests to perform. M The tool allows users to fill out questionnaires to conduct impact assessments for AI. For example IAMA or ALTAI. M The tool can generate a human readable report. M The tools works with a standardized report format, that it can read, write, and update. M The tool supports plugin functionality so additional tests can be added easily. S The tool allows to create custom reports based on components. S It is possible to add custom components for reports. S The tool provides detailed logging, including tracking of different model versions, changes in impact assessments, and technical test results for individual runs. S The tool supports saving progress. S The tool can be used on an isolated system without an internet connection. S The tool offers options to discuss and document conversations. For example, to converse about technical tests or to collaborate on impact assessments. C The tool operates with complete data privacy; it does not share any data or logging information. C The tool allows extension of report formats functionality. C The tool can be integrated in a CI/CD flow. C The tool can be offered as a (cloud) service where no local installation is required. C It is possible to define and automate workflows for repetitive tasks. C The tool offers pre-built connectors or low-code/no-code integration options to simplify the integration process. C"},{"location":"projects/tad/existing-tools/comparison/requirements/#reliability","title":"Reliability","text":"Requirement Priority The tool operates consistently and reliably, meaning it delivers the same expected results every time you use it. M The tool recovers automatically from common failures. S The tool recovers from failures quickly, minimizing data loss, for example by automatically saving intermediate test progress results. S The tool handles errors gracefully and informs users of any issues. S The tool provides clear error messages and instructions for troubleshooting. S"},{"location":"projects/tad/existing-tools/comparison/requirements/#usability","title":"Usability","text":"Requirement Priority The tool possess a clean, intuitive, and visually appealing UI that follows industry standards. S The tool provides clear and consistent navigation, making it easy for users to find what they need. S The tool is responsive and provides instant feedback. S The user interface is multilingual and supports at least English. S The tool offers keyboard shortcuts for efficient interaction. C The user interface can easily be translated into other languages. C"},{"location":"projects/tad/existing-tools/comparison/requirements/#help-documentation","title":"Help & Documentation","text":"Requirement Priority The tool provides comprehensive online help documentation with searchable functionalities. S The tool offers context-sensitive help within the application. C The online documentation includes video tutorials and training materials for ease of learning. C The project provides readily available customer support through various channels (e.g., email, phone, online chat) to address user inquiries and troubleshoot issues. C"},{"location":"projects/tad/existing-tools/comparison/requirements/#performance-efficiency","title":"Performance Efficiency","text":"Requirement Priority The tool operates efficiently and minimize resource utilization. M The tool responds to user actions instantly. M The tool is scalable to accommodate increased user base and data volume. S"},{"location":"projects/tad/existing-tools/comparison/requirements/#maintainability","title":"Maintainability","text":"Requirement Priority The tool is easy to modify and maintain. M The tool adheres to industry coding standards and best practices to ensure code quality and maintainability. M The code is written in a common, widely adopted and supported and actively used and maintained programming language. M The project provides version control for code changes and rollback capabilities. M The project is open source. M It is possible to contribute to the source. S The system is modular, allowing for easy modification of individual components. S Diagnostic tools are available to identify and troubleshoot issues. S"},{"location":"projects/tad/existing-tools/comparison/requirements/#security","title":"Security","text":"Requirement Priority The tool must protect data and system from unauthorized access, use, disclosure, disruption, modification, or destruction. M Regular security audits and penetration testing are conducted. S The tool enforce authorization controls based on user roles and permissions, restricting access to sensitive data and functionalities. C Data encryption is used for sensitive information at rest and in transit. C The project allows for regular security audits and penetration testing to identify vulnerabilities and ensure system integrity. C The tool implements backup functionality to ensure data availability in case of incidents. C"},{"location":"projects/tad/existing-tools/comparison/requirements/#compatibility","title":"Compatibility","text":"Requirement Priority The tool is compatible with existing systems and infrastructure. M The tool supports industry-standard data formats and protocols. M The tool operates seamlessly on supported operating systems and hardware platforms. S The tool supports commonly used data formats (e.g., CSV, Excel, JSON) for easy data exchange with other systems and tools. S The tool integrates with existing security solutions. C"},{"location":"projects/tad/existing-tools/comparison/requirements/#accessibility","title":"Accessibility","text":"Requirement Priority The tool is accessible to users with disabilities, following relevant accessibility standards (e.g., WCAG). S"},{"location":"projects/tad/existing-tools/comparison/requirements/#portability","title":"Portability","text":"Requirement Priority The tool support a range of operating systems (e.g., Windows, macOS, Linux) commonly used within an organization. S The tool minimizes dependencies on specific hardware or software configurations, promoting flexibility. S The tool offers a cloud-based deployment option or be compatible with cloud environments for scalability and accessibility. S The tool adheres to relevant cloud security standards and best practices. S"},{"location":"projects/tad/existing-tools/comparison/requirements/#deployment","title":"Deployment","text":"Requirement Priority The tool has an easy and user-friendly installation and configuration process. S The tool has on-premise or cloud-based deployment options to cater to different organizational needs and infrastructure. S"},{"location":"projects/tad/existing-tools/comparison/requirements/#legal-compliance","title":"Legal & Compliance","text":"Requirement Priority It is clear how the tool is funded to avoid improper influence due to conflicts of interest M The tool is compliant with relevant legal and regulatory requirements. S The tool adheres to (local) data privacy regulations like GDPR, ensuring the protection of user data. S The tool implements appropriate security measures to comply with industry regulations and standards. S The tool is licensed for use within the organization according to the terms and conditions of the license agreement. S The tool respects intellectual property rights and avoid copyright infringement issues. S"},{"location":"projects/tad/existing-tools/comparison/tools/","title":"Research of tools for Transparency of Algorithmic Decision making","text":"In our ongoing research on AI validation and transparency, we are seeking tools to support assessments. Ideal tools would combine various technical tests with checklists and questionnaires and have the ability to generate reports in both human-friendly and machine-exchangeable formats.
This document contains a list of tools we have found and may want to investigate further.
"},{"location":"projects/tad/existing-tools/comparison/tools/#ai-verify","title":"AI Verify","text":"AI Verify is an AI governance testing framework and software toolkit that validates the performance of AI systems against a set of internationally recognized principles through standardized tests, and is consistent with international AI governance frameworks such as those from European Union, OECD and Singapore.
Links: AI Verify Homepage, AI Verify documentation, AI Verify Github.
"},{"location":"projects/tad/existing-tools/comparison/tools/#to-investigate-further","title":"To investigate further","text":""},{"location":"projects/tad/existing-tools/comparison/tools/#verifyml","title":"VerifyML","text":"What is it? VerifyML is an opinionated, open-source toolkit and workflow to help companies implement human-centric AI practices. It seems pretty much equivalent to AI Verify.
Why interesting? The functionality of this toolkit seems to match closely with those of AI Verify. It has a \"git and code first approach\" and has automatic generation of model cards.
Remarks The code seems to be last updated 2 years ago.
Links: VerifyML, VerifyML GitHub
"},{"location":"projects/tad/existing-tools/comparison/tools/#ibm-research-360-toolkit","title":"IBM Research 360 Toolkit","text":"What is it? Open source Python libraries that supports interpretability and explainability of datasets and machine learning models. Most relevant toolkits are the AI Fairness 360 and AI Explainability 360.
Why interesting? Seems to encompass extensive fairness and explainability tests. Codebase seems to be active.
Remarks It comes as Python and R libraries.
Links: AI Fairness 360 Github, AI Explainability 360 Github.
"},{"location":"projects/tad/existing-tools/comparison/tools/#holistic-ai","title":"Holistic AI","text":"What is it? Open source tool to assess and improve the trustworthiness of AI systems. Offers tools to measure and mitigate bias across numerous tasks. Will be extended to include tools for efficacy, robustness, privacy and explainability.
Why interesting? Although it is not entirely clear what exactly this tool does (see Remarks) it does seem (according to their website) to provide reports on bias and fairness. The Github rep does not seem to include any report generating code, but mainly technical tests. Here is an example in which bias is measured in a classification model.
Remarks Website seems to suggest the possibility to generate reports, but this is not directly reflected in the codebase. Possibly reports are only available with some sort of licensed product?
Links: Holistic AI Homepage, Holistic AI Github.
"},{"location":"projects/tad/existing-tools/comparison/tools/#ai-assessment-tool","title":"AI Assessment Tool","text":"What is it? The tool is based on the ALTAI published by the European Commission. It is more of a discussion tool about AI Systems.
Why interesting? Although it only includes questionnaires it does give an interesting way of reporting the end results. Discussions on for example IAMA can be documented as well within the tool.
Remarks The tool of the EU itself is not open-source but the tool from Belgium is. Does not include any technical tests at this point.
Links: AI Assessment Tool Belgium homepage AI Assessment Tool Belgium Github
"},{"location":"projects/tad/existing-tools/comparison/tools/#interesting-to-mention","title":"Interesting to mention","text":"What-if. Provides interface for expanding understanding of a black-box classification or regression ML model. Can be accessed through TensorBoard or as an extension in a Jupyter or Colab notebook. Does not seem to be an active codebase.
Aequitas. Open source bias auditing and Fair ML toolkit. This already seems to be contained within AI Verify, at least the 'fairness tree'.
Facets. Open source toolkit for understanding and analyzing ML datasets. Note that does not include ML models.
Fairness Indicators. Open source Python package which enables easy computation of commonly-identified fairness metrics for binary and multiclass classifiers. Part of TensorFlow. k
Fairlearn. Open source Python package that empowers developers of AI systems to assess their system's fairness and mitigate any observed unfairness issues.
Dalex. The DALEX package x-rays any model and helps to explore and explain its behavior, helps to understand how complex models are working. The main function explain() creates a wrapper around a predictive model. Wrapped models may then be explored and compared with a collection of local and global explainers. Recent developments from the area of Interpretable Machine Learning/eXplainable Artificial Intelligence.
SigmaRed. SigmaRed platform enables comprehensive third-party AI risk management (AI TPRM) and rapidly reduces the cycle time of conducting AI risks assessments while providing deep visibility, control, stakeholder based reporting, and detailed evidence repository. Does not seem to be open source.
Anch.ai. The end-to-end cloud solution empowers global data-driven organizations to govern and deploy responsible, transparent, and explainable AI aligned with upcoming EU regulation AI Act. Does not seem to be open source.
CredoAI. Credo AI is an AI governance platform that helps companies adopt, scale, and govern AI safely and effectively. Does not seem to be open source.
Paper by TNO about the FATE system. Acronym stands for \"FAir, Transparent and Explainable Decision Making.\"
Tools mentioned include some of the above: Aequitas, AI Fairness 360, Dalex, Fairlearn, Responsibly, and What-If-Tool
Links: Paper, Article, Microsoft links.
"},{"location":"projects/tad/existing-tools/comparison/tools_comparison/","title":"Comparison of tools for transparency of algorithmic decision making","text":"We have researched a few tools which we want to investigate further, this document is the next step in that investigation. We created a checklist to compare these tools against. The Fulfilled column will give a numerical value based on whether that requirement is fulfilled or not between 0 and 1. Then the actual scoring is the fulfilled value times the priority (the priority is translated to numerical values in the following way: {M:4, S:3, C:2, W:-1}).
"},{"location":"projects/tad/existing-tools/comparison/tools_comparison/#summary-of-the-comparison","title":"Summary of the comparison","text":"Requirement AIVerify VerifyML IBM 360 Research Toolkit Holistic AI AI Assessment Tool Functionality 36 42 20 17 22.85 Reliability 13 4 16 16 15.4 Usability 9.4 0 0 0 13 Help & Documentation 2.8 1.5 6.4 1.6 0.55 Performance Efficiency 7.5 11 11 11 11 Maintainability 15.8 24.5 29 23.5 25.6 Security 8.3 2 2 2 7.5 Compatibility 12.5 14 14 10 11 Accessibility 0 0 0 0 0.3 Portability 10.5 4.5 5.1 7.5 11.4 Deployment 1.5 0.6 1.2 3.6 3 Legal & Compliance 19 16 16 16 19 Total 136.3 120.1 120.7 108.2 140.6"},{"location":"projects/tad/existing-tools/comparison/tools_comparison/#notable-differences-between-the-tools","title":"Notable differences between the tools","text":"AIVerify notes:
Technical tests are supported, but it can be quite slow because of overhead of the tool
More flexibility would need to be built in before people could use the technical tests
If you have many variables you are not able to show it in the pdf
The error messages in why technical tests don't work on the model are not user-friendly
VerifyML notes:
This tool is not actively developed anymore, parties transferred their focus to AIVerify
This tool does not support for assessments
IBM 360 toolkit notes:
The toolkit has a strong backing of the industry and the community
There are many technical tests included from the latest research, and also supports mitigation algorithms
It is purely for developers and has therefore no support for assessments
Holistic AI:
Like IBM 360 Toolkit it does differentiate to different type of technical assessments like bias and explainability, but it is less extensive than the 360 toolkit
The ambition is large of Holistic AI, they want to capture, Efficacy, Robustness, and Privacy tests as well
It is a private company from the United Kingdom which has open sourced part of their tool
AI Assessment Tool:
This tool does not have any technical tests, but outshines the others with the discussion on assessment option
It is also very performant
AIVerify
is a tool with a UI to execute both assessments and technical tests.
VerifyML
is a Python package to generate Model Cards.
Holistic AI
is a Python package to test for and mitigate Bias in your model.
IBM 360 Research Toolkit
is a Python and R package to test for Fairness & Explainability of your model.
AI Assessment Tool
is a tool with a UI to execute assessments and log discussions.
This document describes the Transparency of Algorithmic Decision making (TAD) Reporting Standard.
For reproducibility, governance, auditing and sharing of algorithmic systems it is essential to have a reporting standard so that information about an algorithmic system can be shared. This reporting standard describes how information about the different phases of an algorithm's life cycle can be reported. It contains, among other things, descriptive information combined with information about the technical tests and assessments applied.
Disclaimer
The TAD Reporting Standard is work in progress. This means that the current standard is probably suboptimal and will change significantly in future versions.
"},{"location":"projects/tad/reporting-standard/#introduction","title":"Introduction","text":"Inspired by Model Cards for Model Reporting and Papers with Code Model Index this standard almost1 2 3 4 extends the Hugging Face model card metadata specification to allow for:
metrics_field
from the Hugging Face metadata specification.measurements
.assessments
.Following Hugging Face, this proposed standard will be written in YAML.
This standard does not contain all fields present in the Hugging Face metadata specification. The fields that are optional in the Hugging Face specification and are specific to the Hugging Face interface are omitted.
Another difference is that we divide our implementation into three separate parts.
system_card
, containing information about a group of ML-models which accomplish a specific task.model_card
, containing information about a specific data science model.assessment_card
, containing information about a regulatory assessment.Include statements
These model_card
s and assessment_card
s can be included verbatim into a system_card
, or referenced with an !include
statement, allowing for minimal cards to be compact in a single file. Extensive cards can be split up for readability and maintainability. Our standard allows for the !include
to be used anywhere.
The standard will be written in YAML. Example YAML files are given in the next section. The standard defines three cards: a system_card
, a model_card
and an assessment_card
. A system_card
contains information about an algorithmic system. It can have multiple models and each of these models should have a model_card
. Regulatory assessments can be processed in an assessment_card
. Note that model_card
's and assessment_card
's can be included directly into the system_card
or can be included as separate YAML files with help of a YAML-include mechanism. For clarity the latter is preferred and is also used in the examples in the next section.
system_card
","text":"A system_card
contains the following information.
schema_version
(REQUIRED, string). Version of the schema used, for example \"0.1a2\".provenance
(OPTIONAL). In case this System Card is generated from another source file, this field can capture the historical context of the contents of this System Card.
git_commit_hash
(OPTIONAL, string). Git commit hash of the commit which contains the transformation file used to create this card.timestamp
(OPTIONAL, string). A timestamp of the date, time and timezone of generation of this System Card. Timestamp should be given, preferably in UTC (represented as Z
), in ISO 8601 format, i.e. 2024-04-16T16:48:14Z
.uri
(OPTIONAL, string). URI to the tool that was used to perform the transformations.author
(OPTIONAL, string). Name of person that initiated the transformations.name
(OPTIONAL, string). Name used to describe the system.
upl
(OPTIONAL, string). If this algorithm is part of a product offered by the Dutch Government, it should contain a URI from the Uniform Product List.owners
(OPTIONAL, list). There can be multiple owners. For each owner the following fields are present.
oin
(OPTIONAL, string). If applicable the Organisatie-identificatienummer (OIN).organization
(OPTIONAL, string). Name of the organization that owns the model. If ion
is NOT provided this field is REQUIRED.name
(OPTIONAL, string). Name of a contact person within the organization.email
(OPTIONAL, string). Email address of the contact person or organization.role
(OPTIONAL, string). Role of the contact person. This field should only be set when the name
field is set.description
(OPTIONAL, string). A short description of the system.
labels
(OPTIONAL, list). This fields allows to store meta information about a system. There can be multiple labels. For each label the following fields are present.
name
(OPTIONAL, string). Name of the label.value
(OPTIONAL, string). Value of the label.status
(OPTIONAL, string). The status of the system. For example the status can be \"production\".
publication_category
(OPTIONAL, enum[string]). The publication category of the algorithm should be chosen from [\"high_risk\", other\"]
.begin_date
(OPTIONAL, string). The first date the system was used. Date should be given in ISO 8601 format, i.e. YYYY-MM-DD
.end_date
(OPTIONAL, string). The last date the system was used. Date should be given in ISO 8601 format, i.e. YYYY-MM-DD
.goal_and_impact
(OPTIONAL, string). The purpose of the system and the impact it has on citizens and companies.considerations
(OPTIONAL, string). The pro's and con's of using the system.risk_management
(OPTIONAL, string). Description of the risks associated with the system.human_intervention
(OPTIONAL, string). A description to want extend there is human involvement in the system.legal_base
(OPTIONAL, list). If there exists a legal base for the process the system is embedded in, this field can be filled in with the relevant laws. There can be multiple legal bases. For each legal base the following fields are present.
name
(OPTIONAL, string). Name of the law.link
(OPTIONAL, string). URI pointing towards the contents of the law.used_data
(OPTIONAL, string). An overview of the data that is used in the system.
technical_design
(OPTIONAL, string). Description on how the system works.external_providers
(OPTIONAL, list). If relevant, these fields allow to store information on external providers. There can be multiple external providers.
name
(OPTIONAL, string). Name of the external provider.version
(OPTIONAL, string). Version of the external provider reflecting its relation to previous versions.references
(OPTIONAL, list[string]). Additional reference URI's that point information about the system and are relevant.
interaction_details
(OPTIONAL, list[string]). Explain how the AI system interacts with hardware or software, including other AI systems, or how the AI system can be used to interact with hardware or software.version_requirements
(OPTIONAL, list[string]). Describe the versions of the relevant software or firmware, and any requirements related to version updates.deployment_variants
(OPTIONAL, list[string]). Description of all the forms in which the AI system is placed on the market or put into service, such as software packages embedded into hardware, downloads, or APIs.hardware_requirements
(OPTIONAL, list[string]). Provide a description of the hardware on which the AI system must be run.product_markings
(OPTIONAL, list[string]). If the AI system is a component of products, photos, or illustrations, describe the external features, markings, and internal layout of those products.user_interface
(OPTIONAL, list). Provide information on the user interface provided to the user responsible for its operation.
description
(OPTIONAL, string). A description of the provided user interface.link
(OPTIONAL, string). A link to the user interface can be included.snapshot
(OPTIONAL, string). A snapshot/screenshot of the user interface can be included with the use of a hyperlink.models
(OPTIONAL, list[ModelCard]). A list of model cards (as defined below) or !include
s of a YAML file containing a model card. This model card can for example be a model card described in the next section or a model card from Hugging Face. There can be multiple model cards, meaning multiple models are used.
assessments
(OPTIONAL, list[AssessmentCard]). A list of assessment cards (as defined below) or !include
s of a YAML file containing a assessment card. This assessment card is an assessment card described in the next section. There can be multiple assessment cards, meaning multiple assessment were performed.
model_card
","text":"A model_card
contains the following information.
provenance
(OPTIONAL). In case this Model Card is generated from another source file, this field can capture the historical context of the contents of this Model Card.
git_commit_hash
(OPTIONAL, string). Git commit hash of the commit which contains the transformation file used to create this card.timestamp
(OPTIONAL, string). A timestamp of the date, time and timezone of generation of this System Card. Timestamp should be given, preferably in UTC (represented as Z
), in ISO 8601 format, i.e. 2024-04-16T16:48:14Z
.uri
(OPTIONAL, string). URI to the tool that was used to perform the transformations.author
(OPTIONAL, string). Name of person that initiated the transformations.language
(OPTIONAL, list[string]). If relevant, the natural languages the model supports in ISO 639. There can be multiple languages.
license
(REQUIRED).
license_name
(REQUIRED, string). Any license from the open source license list1. If the license is NOT present in the license list this field must be set to 'other' and the following two fields will be REQUIRED.license_link
(OPTIONAL, string). A link to a file of that name inside the repo, or a URL to a remote file containing the license contents.tags
(OPTIONAL, list[string]). Tags with keywords to describe the project. There can be multiple tags.
owners
(OPTIONAL, list). There can be multiple owners. For each owner the following fields are present.
oin
(OPTIONAL, string). If applicable the Organisatie-identificatienummer (OIN).organization
(OPTIONAL, string). Name of the organization that owns the model. If ion
is NOT provided this field is REQUIRED.name
(OPTIONAL, string). Name of a contact person within the organization.email
(OPTIONAL, string). Email address of the contact person or organization.role
(OPTIONAL, string). Role of the contact person. This field should only be set when the name
field is set.model_index
(REQUIRED, list). There can be multiple models. For each model the following fields are present.
name
(REQUIRED, string). The name of the model.model
(REQUIRED, string). A URI pointing to a repository containing the model file.artifacts
(OPTIONAL, list). A list of artifacts
uri
(OPTIONAL, string) URI refers to a relevant model artifactcontent-type
(OPTIONAL, string) Optional type, follow the Content-Type. Recognized values are \"application/onnx\"\", to refer to an ONNX representation of the model.md5-checksum
(OPTIONAL, string) Optional checksum for the content of the file.parameters
(OPTIONAL, list). There can be multiple parameters. For each parameter the following fields are present.
name
(REQUIRED, string). The name of the parameter, for example \"epochs\".dtype
(OPTIONAL, string). The datatype of the parameter, for example \"int\".value
(OPTIONAL, string). The value of the parameter, for example 100.labels
(OPTIONAL, list). This field allows to store meta information about a parameter. There can be multiple labels. For each label the following fields are present.
name
(OPTIONAL, string). The name of the label.dtype
(OPTIONAL, string). The datatype of the feature. If name
is set, this field is REQUIRED.value
(OPTIONAL, string). The value of the feature. If name
is set, this field is REQUIRED.results
(OPTIONAL, list). There can be multiple results. For each result the following fields are present.
task
(OPTIONAL, list).
task_type
(REQUIRED, string). The task of the model, for example \"object-classification\".task_name
(OPTIONAL, string). A pretty name for the model tasks, for example \"Object Classification\".datasets
(OPTIONAL, list). There can be multiple datasets 2. For each dataset the following fields are present.
type
(REQUIRED, string). The type of the dataset, can be a dataset id from Hugging Face datasets or any other link to a repository containing the dataset3, for example \"common_voice\".name
(REQUIRED, string). Name pretty name for the dataset, for example \"Common Voice (French)\".split
(OPTIONAL, string). The split of the dataset, for example \"train\".features
(OPTIONAL, list[string]). List of feature names.revision
(OPTIONAL, string). Version of the dataset, for example \"5503434ddd753f426f4b38109466949a1217c2bb\".metrics
(OPTIONAL, list). There can be multiple metrics. For each metric the following fields are present.
type
(REQUIRED, string). A metric-id from Hugging Face metrics4, for example accuracy.name
(REQUIRED, string). A descriptive name of the metric. For example \"false positive rate\" is not a descriptive name, but \"training false positive rate w.r.t class x\" is.dtype
(REQUIRED, string). The data type of the metric, for example float
.value
(REQUIRED, string). The value of the metric.labels
(OPTIONAL, list). This field allows to store meta information about a metric. For example, metrics can be computed for example on subgroups of specific features. For example, one can compute the accuracy for examples where the feature \"gender\" is set to \"male\". There can be multiple subgroups, which means that the metric is computed on the intersection of those subgroups. There can be multiple labels. For each label the following fields are present.
name
(OPTIONAL, string). The name of the feature. For example: \"gender\".type
(OPTIONAL, string). The type of the label. Can for example be set to \"feature\" or \"output_class\". If name
is set, this field is REQUIRED.dtype
(OPTIONAL, string). The datatype of the feature, for example float
. If name
is set, this field is REQUIRED.value
(OPTIONAL, string). The value of the feature. If name
is set, this field is REQUIRED. For example: \"male\".measurements
.
bar_plots
(OPTIONAL, list). The purpose of this field is to capture bar plot like measurements, for example SHAP values. There can be multiple bar plots. For each bar plot the following fields are present.
type
(REQUIRED, string). The type of bar plot, for example \"SHAP\".name
(OPTIONAL, string). A pretty name for the plot, for example \"Mean Absolute SHAP Values\".results
(REQUIRED, list). The contents of the bar plot. A result represents a bar. There can be multiple results. For each result the following fields are present.
name
(REQUIRED, string). The name of bar.value
(REQUIRED, float). The value of the corresponding bar.graph_plots
(OPTIONAL, list). The purpose of this field is to capture graph plot like measurements, such as partial dependence plots. There can be multiple graph plots. For each graph plot the following fields are present.
type
(REQUIRED, string). The type of the graph plot, for example \"partial_dependence\".name
(OPTIONAL, string). A pretty name of the graph, for example \"Partial Dependence Plot\".results
(REQUIRED, list). Results contains the graph plot data. Each graph can depend on a specific output class and feature. There can be multiple results. For each result the following fields are present.
class
(OPTIONAL, string/int/float/bool). The output class name that the graph corresponds to. This field is not always present.feature
(REQUIRED, string). The feature the graph corresponds to. This is required, since all relevant graphs are dependent on features.data
(REQUIRED, list)
x_value
(REQUIRED, float). The $x$-value of the graph.y_value
(REQUIRED, float). The $y$-value of the graph.assessment_card
","text":"An assessment_card
contains the following information.
provenance
(OPTIONAL). In case this Assessment Card is generated from another source file, this field can capture the historical context of the contents of this Assessment Card.
git_commit_hash
(OPTIONAL, string). Git commit hash of the commit which contains the transformation file used to create this card.timestamp
(OPTIONAL, string). A timestamp of the date, time and timezone of generation of this System Card. Timestamp should be given, preferably in UTC (represented as Z
), in ISO 8601 format, i.e. 2024-04-16T16:48:14Z
.uri
(OPTIONAL, string). URI to the tool that was used to perform the transformations.author
(OPTIONAL, string). Name of person that initiated the transformations.name
(REQUIRED, string). The name of the assessment.
urn
(OPTIONAL, string). A Uniform Resource Name (URN) of the instrument in the instrument register.date
(REQUIRED, string). The date at which the assessment is completed. Date should be given in ISO 8601 format, i.e. YYYY-MM-DD
.contents
(REQUIRED, list). There can be multiple items in contents. For each item the following fields are present:
question
(REQUIRED, string). A question.urn
(OPTIONAL, string). A Uniform Resource Name (URN) of the corresponding task in the instrument register.answer
(REQUIRED, string). An answer.remarks
(OPTIONAL, string). A field to put relevant discussion remarks in.authors
(OPTIONAL, list). There can be multiple names. For each name the following field is present.
name
(OPTIONAL, string). The name of the author of the question.timestamp
(OPTIONAL, string). A timestamp of the date, time and timezone of the answer. Timestamp should be given, preferably in UTC (represented as Z
), in ISO 8601 format, i.e. 2024-04-16T16:48:14Z
.
version: {system_card_version}\nprovenance:\n git_commit_hash: {git_commit_hash}\n timestamp: {modification_timestamp}\n uri: {modification_uri}\n author: {modification_author}\nname: {system_name}\nupl: {upl_uri}\nowners:\n - oin: {oin}\n organization: {organization_name}\n name: {owner_name}\n email: {owner_email}\n role: {owner_role}\ndescription: {system_description}\nlabels:\n - name: {label_name}\n value: {label_value}\nstatus: {system_status}\npublication_category: {system_publication_cat}\nbegin_date: {system_begin_date}\nend_date: {system_end_date}\ngoal_and_impact: {system_goal_and_impact}\nconsiderations: {system_considerations}\nrisk_management: {system_risk_management}\nhuman_intervention: {system_human_intervention}\nlegal_base:\n - name: {law_name}\n link: {law_uri}\nused_data: {system_used_data}\ntechnical_design: {technical_design}\nexternal_providers:\n - name: {name_external_provider}\n version: {version_external_provider}\nreferences:\n - {reference_uri}\ninteraction_details:\n - {system_interaction_details}\nversion_requirements:\n - {system_version_requirements}\ndeployment_variants:\n - {system_deployment_variants}\nhardware_requirements:\n - {system_hardware_requirements}\nproduct_markings:\n - {system_product_markings}\nuser_interface:\n - description: {system_user_interface}\n link: {system_user_interface_uri}\n snapshot: {system_user_interface_snapshot_uri}\n\nmodels:\n - !include {model_card_uri}\n\nassessments:\n - !include {assessment_card_uri}\n
"},{"location":"projects/tad/reporting-standard/#model-card","title":"Model Card","text":"provenance:\n git_commit_hash: {git_commit_hash}\n timestamp: {modification_timestamp}\n uri: {modification_uri}\n author: {modification_author}\nlanguage:\n - {lang_0}\nlicense:\n license_name: {license_name}\n license_link: {license_uri}\ntags:\n - {tag_0}\nowners:\n - oin: {oin}\n organization: {organization_name}\n name: {owner_name}\n email: {owner_email}\n role: {owner_role}\n\nmodel-index:\n - name: {model_id}\n model: {model_uri}\n artifacts:\n - uri: {model_artifact_uri}\n - content-type: {model_artifact_type}\n - md5-checksum: {md5_checksum}\n parameters:\n - name: {parameter_name}\n dtype: {parameter_dtype}\n value: {parameter_value}\n labels:\n - name: {label_name}\n dtype: {label_type}\n value: {label_value}\n results:\n - task:\n - type: {task_type}\n name: {task_name}\n datasets:\n - type: {dataset_type}\n name: {dataset_name}\n split: {split}\n features:\n - {feature_name}\n revision: {dataset_version}\n metrics:\n - type: {metric_type}\n name: {metric_name}\n dtype: {metric_dtype}\n value: {metric_value}\n labels:\n - name: {label_name}\n type: {label_type}\n dtype: {label_type}\n value: {label_value}\n measurements:\n bar_plots:\n - type: {measurement_type}\n name: {measurement_name}\n results:\n - name: {bar_name}\n value: {bar_value}\n graph_plots:\n - type: {measurement_type}\n name: {measurement_name}\n results:\n - class: {class_name}\n feature: {feature_name}\n data:\n - x_value: {x_value}\n y_value: {y_value}\n
"},{"location":"projects/tad/reporting-standard/#assessment-card","title":"Assessment Card","text":"provenance:\n git_commit_hash: {git_commit_hash}\n timestamp: {modification_timestamp}\n uri: {modification_uri}\n author: {modification_author}\nname: {assessment_name}\nurn: {urn}\ndate: {assessment_date}\ncontents:\n - question: {question_text}\n urn: {urn}\n answer: {answer_text}\n remarks: {remarks_text}\n authors:\n - name: {author_name}\n timestamp: {timestamp}\n
"},{"location":"projects/tad/reporting-standard/#schema","title":"Schema","text":"JSON schema will be added when we publish the first beta version.
"},{"location":"projects/tad/reporting-standard/#changelog","title":"Changelog","text":"Deviation from the Hugging Face specification is in the License field. Hugging Face only accepts dataset id's from Hugging Face license list while we accept any license from Open Source License List.\u00a0\u21a9\u21a9
Deviation from the Hugging Face specification is in the model_index:results:dataset
field. Hugging Face only accepts one dataset, while we accept a list of datasets.\u00a0\u21a9\u21a9
Deviation from the Hugging Face specification is in the Dataset Type field. Hugging Face only accepts dataset id's from Hugging Face datasets while we also allow for any url pointing to the dataset.\u00a0\u21a9\u21a9
For this extension to work relevant metrics (such as for example false positive rate) have to be added to the Hugging Face metrics, possibly this can be done in our organizational namespace.\u00a0\u21a9\u21a9
This document describes the Transparency of Algorithmic Decision making (TAD) Reporting Standard.
For reproducibility, governance, auditing and sharing of algorithmic systems it is essential to have a reporting standard so that information about an algorithmic system can be shared. This reporting standard describes how information about the different phases of an algorithm's life cycle can be reported. It contains, among other things, descriptive information combined with information about the technical tests and assessments applied.
Disclaimer
The TAD Reporting Standard is work in progress. This means that the current standard is probably suboptimal and will change significantly in future versions.
"},{"location":"projects/tad/reporting-standard/0.1a1/#introduction","title":"Introduction","text":"Inspired by Model Cards for Model Reporting and Papers with Code Model Index this standard almost 1 2 3 4 extends the Hugging Face model card metadata specification to allow for:
metrics_field
from the Hugging Face metadata specification.measurements
.assessments
.Following Hugging Face, this proposed standard will be written in yaml.
This standard does not contain all fields present in the Hugging Face metadata specification. The fields that are optional in the Hugging Face specification and are specific to the Hugging Face interface are omitted.
Another difference is that we divide our implementation into three separate parts.
system_card
, containing information about a group of ML-models which accomplish a specific task.model_card
, containing information about a specific data science model.assessment_card
, containing information about a regulatory assessment.Include statements
These model_card
s and assessment_card
s can be included verbatim into a system_card
, or referenced with an !include
statement, allowing for minimal cards to be compact in a single file. Extensive cards can be split up for readability and maintainability. Our standard allows for the !include
to be used anywhere.
The standard will be written in yaml. Example yaml files are given in the next section. The standard defines three cards: a system_card
, a model_card
and an assessment_card
. A system_card
contains information about an algorithmic system. It can have multiple models and each of these models should have a model_card
. Regulatory assessments can be processed in an assessment_card
. Note that model_card
's and assessment_card
's can be included directly into the system_card
or can be included as separate yaml files with help of a yaml-include mechanism. For clarity the latter is preferred and is also used in the examples in the next section.
system_card
","text":"A system_card
contains the following information.
schema_version
(REQUIRED, string). Version of the schema used, for example \"0.1a1\".name
(OPTIONAL, string). Name used to describe the system.upl
(OPTIONAL, string). If this algorithm is part of a product offered by the Dutch Government, it should contain a URI from the Uniform Product List.owners
(list). There can be multiple owners. For each owner the following fields are present.
oin
(OPTIONAL, string). If applicable the Organisatie-identificatienummer (OIN).organization
(OPTIONAL, string). Name of the organization that owns the model. If ion
is NOT provided this field is REQUIRED.name
(OPTIONAL, string). Name of a contact person within the organization.email
(OPTIONAL, string). Email address of the contact person or organization.role
(OPTIONAL, string). Role of the contact person. This field should only be set when the name
field is set.description
(OPTIONAL, string). A short description of the system.
labels
(OPTIONAL, list). This fields allows to store meta information about a system. There can be multiple labels. For each label the following fields are present.
name
(OPTIONAL, string). Name of the label.value
(OPTIONAL, string). Value of the label.status
(OPTIONAL, string). The status of the system. For example the status can be \"production\".
publication_category
(OPTIONAL, enum[string]). The publication category of the algorithm should be chosen from [\"high_risk\", other\"]
.begin_date
(OPTIONAL, string). The first date the system was used. Date should be given in ISO 8601 format, i.e. YYYY-MM-DD.end_date
(OPTIONAL, string). The last date the system was used. Date should be given in ISO 8601 format, i.e. YYYY-MM-DD.goal_and_impact
(OPTIONAL, string). The purpose of the system and the impact it has on citizens and companies.considerations
(OPTIONAL, string). The pro's and con's of using the system.risk_management
(OPTIONAL, string). Description of the risks associated with the system.human_intervention
(OPTIONAL, string). A description to want extend there is human involvement in the system.legal_base
(OPTIONAL, list). If there exists a legal base for the process the system is embedded in, this field can be filled in with the relevant laws. There can be multiple legal bases. For each legal base the following fields are present.name
(OPTIONAL, string). Name of the law.link
(OPTIONAL, string). URI pointing towards the contents of the law.used_data
(OPTIONAL, string). An overview of the data that is used in the system.technical_design
(OPTIONAL, string). Description on how the system works.external_providers
(OPTIONAL, list[string]). Name of an external provider, if relevant. There can be multiple external providers.references
(OPTIONAL, list[string]). Additional reference URI's that point information about the system and are relevant.models
(OPTIONAL, list[ModelCard]). A list of model cards (as defined below) or !include
s of a yaml file containing a model card. This model card can for example be a model card described in the next section or a model card from Hugging Face. There can be multiple model cards, meaning multiple models are used.assessments
(OPTIONAL, list[AssessmentCard]). A list of assessment cards (as defined below) or !include
s of a yaml file containing a assessment card. This assessment card is an assessment card described in the next section. There can be multiple assessment cards, meaning multiple assessment were performed.model_card
","text":"A model_card
contains the following information.
language
(OPTIONAL, list[string]). If relevant, the natural languages the model supports in ISO 639. There can be multiple languages.license
(REQUIRED, string). Any license from the open source license list 1. If the license is NOT present in the license list this field must be set to 'other' and the following two fields will be REQUIRED.
license_name
(string). An id for the license.license_link
(string). A link to a file of that name inside the repo, or a URL to a remote file containing the license contents.tags
(OPTIONAL, list[string]). Tags with keywords to describe the project. There can be multiple tags.
owners
(list). There can be multiple owners. For each owner the following fields are present.
oin
(OPTIONAL, string). If applicable the Organisatie-identificatienummer (OIN).organization
(OPTIONAL, string). Name of the organization that owns the model. If ion
is NOT provided this field is REQUIRED.name
(OPTIONAL, string). Name of a contact person within the organization.email
(OPTIONAL, string). Email address of the contact person or organization.role
(OPTIONAL, string). Role of the contact person. This field should only be set when the name
field is set.There can be multiple models. For each model the following fields are present.
name
(REQUIRED, string). The name of the model.model
(REQUIRED, string). A URI pointing to a repository containing the model file.artifacts
(OPTIONAL, list[string]). A list of URI's where each URI refers to a relevant model artifact, that cannot be captured by any other field, but are relevant to model.parameters
(list). There can be multiple parameters. For each parameter the following fields are present.
name
(REQUIRED, string). The name of the parameter, for example \"epochs\".dtype
(OPTIONAL, string). The datatype of the parameter, for example \"int\".value
(OPTIONAL, string). The value of the parameter, for example 100.labels
(list). This field allows to store meta information about a parameter. There can be multiple labels. For each label the following fields are present.
name
(OPTIONAL, string). The name of the label.dtype
(OPTIONAL, string). The datatype of the feature. If name
is set, this field is REQUIRED.value
(OPTIONAL, string). The value of the feature. If name
is set, this field is REQUIRED.results
(list). There can be multiple results. For each result the following fields are present.
task
(OPTIONAL, list).
task_type
(REQUIRED, string). The task of the model, for example \"object-classification\".task_name
(OPTIONAL, string). A pretty name for the model tasks, for example \"Object Classification\".datasets
(list). There can be multiple datasets 2. For each dataset the following fields are present.
type
(REQUIRED, string). The type of the dataset, can be a dataset id from Hugging Face datasets or any other link to a repository containing the dataset3, for example \"common_voice\".name
(REQUIRED, string). Name pretty name for the dataset, for example \"Common Voice (French)\".split
(OPTIONAL, string). The split of the dataset, for example \"train\".features
(OPTIONAL, list[string]). List of feature names.revision
(OPTIONAL, string). Version of the dataset, for example 5503434ddd753f426f4b38109466949a1217c2bb.metrics
(list). There can be multiple metrics. For each metric the following fields are present.
type
(REQUIRED, string). A metric-id from Hugging Face metrics4, for example accuracy.name
(REQUIRED, string). A descriptive name of the metric. For example \"false positive rate\" is not a descriptive name, but \"training false positive rate w.r.t class x\" is.dtype
(REQUIRED, string). The data type of the metric, for example float
.value
(REQUIRED, string). The value of the metric.labels
(list). This field allows to store meta information about a metric. For example, metrics can be computed for example on subgroups of specific features. For example, one can compute the accuracy for examples where the feature \"gender\" is set to \"male\". There can be multiple subgroups, which means that the metric is computed on the intersection of those subgroups. There can be multiple labels. For each label the following fields are present.
name
(OPTIONAL, string). The name of the feature. For example: \"gender\".type
(OPTIONAL, string). The type of the label. Can for example be set to \"feature\" or \"output_class\". If name
is set, this field is REQUIRED.dtype
(OPTIONAL, string). The datatype of the feature, for example float
. If name
is set, this field is REQUIRED.value
(OPTIONAL, string). The value of the feature. If name
is set, this field is REQUIRED. For example: \"male\".measurements
.
bar_plots
(list). The purpose of this field is to capture bar plot like measurements, for example SHAP values. There can be multiple bar plots. For each bar plot the following fields are present.
type
(REQUIRED, string). The type of bar plot, for example \"SHAP\".name
(OPTIONAL, string). A pretty name for the plot, for example \"Mean Absolute SHAP Values\".results
(list). The contents of the bar plot. A result represents a bar. There can be multiple results. For each result the following fields are present.name
(REQUIRED, string). The name of bar.value
(REQUIRED, float). The value of the corresponding bar.graph_plots
(list). The purpose of this field is to capture graph plot like measurements, such as partial dependence plots. There can be multiple graph plots. For each graph plot the following fields are present.
type
(REQUIRED, string). The type of the graph plot, for example \"partial_dependence\".name
(OPTIONAL, string). A pretty name of the graph, for example \"Partial Dependence Plot\".results
(list). Results contains the graph plot data. Each graph can depend on a specific output class and feature. There can be multiple results. For each result the following fields are present.class
(OPTIONAL, string/int/float/bool). The output class name that the graph corresponds to. This field is not always present.feature
(REQUIRED, string). The feature the graph corresponds to. This is required, since all relevant graphs are dependent on features.data
(list)x_value
(REQUIRED, float). The $x$-value of the graph.y_value
(REQUIRED, float). The $y$-value of the graph.assessment_card
","text":"An assessment_card
contains the following information.
name
(REQUIRED, string). The name of the assessment.date
(REQUIRED, string). The date at which the assessment is completed. Date should be given in ISO 8601 format, i.e. YYYY-MM-DD.contents
(list). There can be multiple items in contents. For each item the following fields are present:
question
(REQUIRED, string). A question.answer
(REQUIRED, string). An answer.remarks
(OPTIONAL, string). A field to put relevant discussion remarks in.authors
. There can be multiple names. For each name the following field is present.name
(OPTIONAL, string). The name of the author of the question.timestamp
(OPTIONAL, string). A timestamp of the date and time of the answer.version: {system_card_version} # Optional. Example: \"0.1a1\"\nname: {system_name} # Optional. Example: \"AangifteVertrekBuitenland\"\nupl: {upl_uri} # Optional. Example: https://standaarden.overheid.nl/owms/terms/AangifteVertrekBuitenland\nowners:\n- oin: {oin} # Optional. Example: 00000001003214345000\n organization: {organization_name} # Optional if oin is provided, Required otherwise. Example: BZK\n name: {owner_name} # Optional. Example: John Doe\n email: {owner_email} # Optional. Example: johndoe@email.com\n role: {owner_role} # Optional. Example: Data Scientist.\ndescription: {system_description} # Optional. Short description of the system.\nlabels: # Optional. Labels to store metadata about the system.\n- name: {label_name} # Optional.\n value: {label_value} # Optional.\nstatus: {system_status} # Optional. Example \"production\".\npublication_category: {system_publication_cat} # Optional. Example: \"impactful_algorithm\".\nbegin_date: {system_begin_date} # Optional. Example: 2025-1-1.\nend_date: {system_end_date} # Optional. Example: 2025-12-1.\ngoal_and_impact: {system_goal_and_impact} # Optional. Goal and impact of the system.\nconsiderations: {system_considerations} # Optional. Considerations about the system.\nrisk_management: {system_risk_management} # Optional. Description of risks associated with the system.\nhuman_intervention: {system_human_intervention} # Optional. Description of human involvement in the system.\nlegal_base:\n- name: {law_name} # Optional. Example: \"AVG\".\n link: {law_uri} # Optional. Example: \"https://eur-lex.europa.eu/legal-content/NL/TXT/HTML/?uri=CELEX:31995L0046\".\nused_data: {system_used_data} # Optional. Description of the data used by the system.\ntechnical_design: {technical_design} # Optional. Description of the technical design of the system.\nexternal_providers:\n- {system_external_provider} # Optional. Reference to used external providers.\nreferences:\n- {reference_uri} # Optional. Example: URI to codebase.\n\nmodels:\n- !include {model_card_uri} # Optional. Example: cat_classifier_model.yaml.\n\nassessments:\n- !include {assessment_card_uri} # Required. Example: iama.yaml.\n
"},{"location":"projects/tad/reporting-standard/0.1a1/#model-card","title":"Model Card","text":"language:\n - {lang_0} # Optional. Example nl.\nlicense: {license} # Required. Example: Apache-2.0 or any license SPDX ID from https://opensource.org/license or \"other\".\nlicense_name: {license_name} # Optional if license != other, Required otherwise. Example: 'my-license-1.0'\nlicense_link: {license_link} # Optional if license != other, Required otherwise. Specify \"LICENSE\" or \"LICENSE.md\" to link to a file of that name inside the repo, or a URL to a remote file.\ntags:\n- {tag_0} # Optional. Example: audio\n- {tag_1} # Optional. Example: automatic-speech-recognition\nowners:\n- organization: {organization_name} # Required. Example: BZK\n oin: {oin} # Optional. Example: 00000001003214345000\n name: {owner_name} # Optional. Example: John Doe\n email: {owner_email} # Optional. Example: johndoe@email.com\n role: {owner_role} # Optional. Example: Data Scientist.\n\nmodel-index:\n- name: {model_id} # Required. Example: CatClassifier.\n model: {model_uri} # Required. URI to a repository containing the model file.\n artifacts:\n - {model_artifact} # Optional. URI to relevant model artifacts, if applicable.\n parameters:\n - name: {parameter_name} # Optional. Example: \"epochs\".\n dtype: {parameter_dtype} # Optional. Example: \"int\".\n value: {parameter_value} # Optional. Example: 100.\n labels:\n - name: {label_name} # Optional. Example: \"gender\".\n dtype: {label_type} # Optional. Example: \"string\".\n value: {label_value} # Optional. Example: \"female\".\n results:\n - task:\n type: {task_type} # Required. Example: image-classification.\n name: {task_name} # Optional. Example: Image Classification.\n datasets:\n - type: {dataset_type} # Required. Example: common_voice. Link to a repository containing the dataset\n name: {dataset_name} # Required. Example: \"Common Voice (French)\". A pretty name for the dataset.\n split: {split} # Optional. Example: \"train\".\n features:\n - {feature_name} # Optional. Example: \"gender\".\n revision: {dataset_version} # Optional. Example: 5503434ddd753f426f4b38109466949a1217c2bb\n metrics:\n - type: {metric_type} # Required. Example: false-positive-rate. Use metric id from https://hf.co/metrics.\n name: {metric_name} # Required. Example: \"FPR wrt class 0 restricted to feature gender:0 and age:21\".\n dtype: {metric_dtype} # Required. Example: \"float\".\n value: {metric_value} # Required. Example: 0.75.\n labels:\n - name: {label_name} # Optional. Example: \"gender\".\n type: {label_type} # Optional. Example: \"feature\".\n dtype: {label_type} # Optional. Example: \"string\".\n value: {label_value} # Optional. Example: \"female\".\n measurements:\n # Bar plots should be able to capture SHAP and Robustness Toolbox from AI Verify.\n bar_plots:\n - type: {measurement_type} # Required. Example: \"SHAP\".\n name: {measurement_name} # Optional. Example: \"Mean Absolute Shap Values\".\n results:\n - name: {bar_name} # Required. The name of a bar.\n value: {bar_value} # Required. The corresponding value.\n # Graph plots should be able to capture graph based measurements such as partial dependence and accumulated local effect.\n graph_plots:\n - type: {measurement_type} # Required. Example: \"partial_dependence\".\n name: {measurement_name} # Optional. Example: \"Partial Dependence Plot\".\n # Results store the graph plot data. So far all plots are dependent on a combination of a specific class (sometimes) and feature (always).\n # For example partial dependence plots are made for each feature and class.\n results:\n - class: {class_name} # Optional. Name of the output class the graph depends on.\n feature: {feature_name} # Required. Name of the feature the graph depends on.\n data:\n - x_value: {x_value} # Required. The x value of the graph data.\n y_value: {y_value} # Required. The y value of the graph data.\n
"},{"location":"projects/tad/reporting-standard/0.1a1/#assessment-card","title":"Assessment Card","text":"name: {assessment_name} # Required. Example: IAMA.\ndate: {assessment_date} # Required. Example: 25-03-2025.\ncontents:\n - question: {question_text} # Required. Example: \"Question 1: ...\".\n answer: {answer_text} # Required. Example: \"Answer: ...\".\n remarks: {remarks_text} # Optional. Example: \"Remarks: ...\".\n authors: # Optional. Example: \"['John', 'Peter']\".\n - name: {author_name}\n timestamp: {timestamp} # Optional. Example: 1711630721.\n
"},{"location":"projects/tad/reporting-standard/0.1a1/#schema","title":"Schema","text":"JSON schema will be added when we publish the first beta version.
Deviation from the Hugging Face specification is in the License field. Hugging Face only accepts dataset id's from Hugging Face license list while we accept any license from Open Source License List.\u00a0\u21a9\u21a9
Deviation from the Hugging Face specification is in the model_index:results:dataset
field. Hugging Face only accepts one dataset, while we accept a list of datasets.\u00a0\u21a9\u21a9
Deviation from the Hugging Face specification is in the Dataset Type field. Hugging Face only accepts dataset id's from Hugging Face datasets while we also allow for any url pointing to the dataset.\u00a0\u21a9\u21a9
For this extension to work relevant metrics (such as for example false positive rate) have to be added to the Hugging Face metrics, possibly this can be done in our organizational namespace.\u00a0\u21a9\u21a9
This document describes the Transparency of Algorithmic Decision making (TAD) Reporting Standard.
For reproducibility, governance, auditing and sharing of algorithmic systems it is essential to have a reporting standard so that information about an algorithmic system can be shared. This reporting standard describes how information about the different phases of an algorithm's life cycle can be reported. It contains, among other things, descriptive information combined with information about the technical tests and assessments applied.
Disclaimer
The TAD Reporting Standard is work in progress. This means that the current standard is probably suboptimal and will change significantly in future versions.
"},{"location":"projects/tad/reporting-standard/0.1a2/#introduction","title":"Introduction","text":"Inspired by Model Cards for Model Reporting and Papers with Code Model Index this standard almost 1 2 3 4 extends the Hugging Face model card metadata specification to allow for:
metrics_field
from the Hugging Face metadata specification.measurements
.assessments
.Following Hugging Face, this proposed standard will be written in yaml.
This standard does not contain all fields present in the Hugging Face metadata specification. The fields that are optional in the Hugging Face specification and are specific to the Hugging Face interface are omitted.
Another difference is that we divide our implementation into three separate parts.
system_card
, containing information about a group of ML-models which accomplish a specific task.model_card
, containing information about a specific data science model.assessment_card
, containing information about a regulatory assessment.Include statements
These model_card
s and assessment_card
s can be included verbatim into a system_card
, or referenced with an !include
statement, allowing for minimal cards to be compact in a single file. Extensive cards can be split up for readability and maintainability. Our standard allows for the !include
to be used anywhere.
The standard will be written in yaml. Example yaml files are given in the next section. The standard defines three cards: a system_card
, a model_card
and an assessment_card
. A system_card
contains information about an algorithmic system. It can have multiple models and each of these models should have a model_card
. Regulatory assessments can be processed in an assessment_card
. Note that model_card
's and assessment_card
's can be included directly into the system_card
or can be included as separate yaml files with help of a yaml-include mechanism. For clarity the latter is preferred and is also used in the examples in the next section.
system_card
","text":"A system_card
contains the following information.
schema_version
(REQUIRED, string). Version of the schema used, for example \"0.1a2\".name
(OPTIONAL, string). Name used to describe the system.upl
(OPTIONAL, string). If this algorithm is part of a product offered by the Dutch Government, it should contain a URI from the Uniform Product List.owners
(list). There can be multiple owners. For each owner the following fields are present.
oin
(OPTIONAL, string). If applicable the Organisatie-identificatienummer (OIN).organization
(OPTIONAL, string). Name of the organization that owns the model. If ion
is NOT provided this field is REQUIRED.name
(OPTIONAL, string). Name of a contact person within the organization.email
(OPTIONAL, string). Email address of the contact person or organization.role
(OPTIONAL, string). Role of the contact person. This field should only be set when the name
field is set.description
(OPTIONAL, string). A short description of the system.
labels
(OPTIONAL, list). This fields allows to store meta information about a system. There can be multiple labels. For each label the following fields are present.
name
(OPTIONAL, string). Name of the label.value
(OPTIONAL, string). Value of the label.status
(OPTIONAL, string). The status of the system. For example the status can be \"production\".
publication_category
(OPTIONAL, enum[string]). The publication category of the algorithm should be chosen from [\"high_risk\", other\"]
.begin_date
(OPTIONAL, string). The first date the system was used. Date should be given in ISO 8601 format, i.e. YYYY-MM-DD.end_date
(OPTIONAL, string). The last date the system was used. Date should be given in ISO 8601 format, i.e. YYYY-MM-DD.goal_and_impact
(OPTIONAL, string). The purpose of the system and the impact it has on citizens and companies.considerations
(OPTIONAL, string). The pro's and con's of using the system.risk_management
(OPTIONAL, string). Description of the risks associated with the system.human_intervention
(OPTIONAL, string). A description to want extend there is human involvement in the system.legal_base
(OPTIONAL, list). If there exists a legal base for the process the system is embedded in, this field can be filled in with the relevant laws. There can be multiple legal bases. For each legal base the following fields are present.name
(OPTIONAL, string). Name of the law.link
(OPTIONAL, string). URI pointing towards the contents of the law.used_data
(OPTIONAL, string). An overview of the data that is used in the system.technical_design
(OPTIONAL, string). Description on how the system works.external_providers
(OPTIONAL, list[string]). Name of an external provider, if relevant. There can be multiple external providers.references
(OPTIONAL, list[string]). Additional reference URI's that point information about the system and are relevant.models
(OPTIONAL, list[ModelCard]). A list of model cards (as defined below) or !include
s of a yaml file containing a model card. This model card can for example be a model card described in the next section or a model card from Hugging Face. There can be multiple model cards, meaning multiple models are used.assessments
(OPTIONAL, list[AssessmentCard]). A list of assessment cards (as defined below) or !include
s of a yaml file containing a assessment card. This assessment card is an assessment card described in the next section. There can be multiple assessment cards, meaning multiple assessment were performed.model_card
","text":"A model_card
contains the following information.
language
(OPTIONAL, list[string]). If relevant, the natural languages the model supports in ISO 639. There can be multiple languages.license
(REQUIRED, string). Any license from the open source license list 1. If the license is NOT present in the license list this field must be set to 'other' and the following two fields will be REQUIRED.
license_name
(string). An id for the license.license_link
(string). A link to a file of that name inside the repo, or a URL to a remote file containing the license contents.tags
(OPTIONAL, list[string]). Tags with keywords to describe the project. There can be multiple tags.
owners
(list). There can be multiple owners. For each owner the following fields are present.
oin
(OPTIONAL, string). If applicable the Organisatie-identificatienummer (OIN).organization
(OPTIONAL, string). Name of the organization that owns the model. If ion
is NOT provided this field is REQUIRED.name
(OPTIONAL, string). Name of a contact person within the organization.email
(OPTIONAL, string). Email address of the contact person or organization.role
(OPTIONAL, string). Role of the contact person. This field should only be set when the name
field is set.There can be multiple models. For each model the following fields are present.
name
(REQUIRED, string). The name of the model.model
(REQUIRED, string). A URI pointing to a repository containing the model file.artifacts
(OPTIONAL, list). A list of artifacts
uri
(OPTIONAL, string) URI refers to a relevant model artifactcontent-type
(OPTIONAL, string) Optional type, follow the Content-Type. Recognized values are \"application/onnx\"\", to refer to an ONNX representation of the model.md5-checksum
(OPTIONAL, string) Optional checksum for the content of the file.parameters
(list). There can be multiple parameters. For each parameter the following fields are present.
name
(REQUIRED, string). The name of the parameter, for example \"epochs\".dtype
(OPTIONAL, string). The datatype of the parameter, for example \"int\".value
(OPTIONAL, string). The value of the parameter, for example 100.labels
(list). This field allows to store meta information about a parameter. There can be multiple labels. For each label the following fields are present.
name
(OPTIONAL, string). The name of the label.dtype
(OPTIONAL, string). The datatype of the feature. If name
is set, this field is REQUIRED.value
(OPTIONAL, string). The value of the feature. If name
is set, this field is REQUIRED.results
(list). There can be multiple results. For each result the following fields are present.
task
(OPTIONAL, list).
task_type
(REQUIRED, string). The task of the model, for example \"object-classification\".task_name
(OPTIONAL, string). A pretty name for the model tasks, for example \"Object Classification\".datasets
(list). There can be multiple datasets 2. For each dataset the following fields are present.
type
(REQUIRED, string). The type of the dataset, can be a dataset id from Hugging Face datasets or any other link to a repository containing the dataset3, for example \"common_voice\".name
(REQUIRED, string). Name pretty name for the dataset, for example \"Common Voice (French)\".split
(OPTIONAL, string). The split of the dataset, for example \"train\".features
(OPTIONAL, list[string]). List of feature names.revision
(OPTIONAL, string). Version of the dataset, for example 5503434ddd753f426f4b38109466949a1217c2bb.metrics
(list). There can be multiple metrics. For each metric the following fields are present.
type
(REQUIRED, string). A metric-id from Hugging Face metrics4, for example accuracy.name
(REQUIRED, string). A descriptive name of the metric. For example \"false positive rate\" is not a descriptive name, but \"training false positive rate w.r.t class x\" is.dtype
(REQUIRED, string). The data type of the metric, for example float
.value
(REQUIRED, string). The value of the metric.labels
(list). This field allows to store meta information about a metric. For example, metrics can be computed for example on subgroups of specific features. For example, one can compute the accuracy for examples where the feature \"gender\" is set to \"male\". There can be multiple subgroups, which means that the metric is computed on the intersection of those subgroups. There can be multiple labels. For each label the following fields are present.
name
(OPTIONAL, string). The name of the feature. For example: \"gender\".type
(OPTIONAL, string). The type of the label. Can for example be set to \"feature\" or \"output_class\". If name
is set, this field is REQUIRED.dtype
(OPTIONAL, string). The datatype of the feature, for example float
. If name
is set, this field is REQUIRED.value
(OPTIONAL, string). The value of the feature. If name
is set, this field is REQUIRED. For example: \"male\".measurements
.
bar_plots
(list). The purpose of this field is to capture bar plot like measurements, for example SHAP values. There can be multiple bar plots. For each bar plot the following fields are present.
type
(REQUIRED, string). The type of bar plot, for example \"SHAP\".name
(OPTIONAL, string). A pretty name for the plot, for example \"Mean Absolute SHAP Values\".results
(list). The contents of the bar plot. A result represents a bar. There can be multiple results. For each result the following fields are present.name
(REQUIRED, string). The name of bar.value
(REQUIRED, float). The value of the corresponding bar.graph_plots
(list). The purpose of this field is to capture graph plot like measurements, such as partial dependence plots. There can be multiple graph plots. For each graph plot the following fields are present.
type
(REQUIRED, string). The type of the graph plot, for example \"partial_dependence\".name
(OPTIONAL, string). A pretty name of the graph, for example \"Partial Dependence Plot\".results
(list). Results contains the graph plot data. Each graph can depend on a specific output class and feature. There can be multiple results. For each result the following fields are present.class
(OPTIONAL, string/int/float/bool). The output class name that the graph corresponds to. This field is not always present.feature
(REQUIRED, string). The feature the graph corresponds to. This is required, since all relevant graphs are dependent on features.data
(list)x_value
(REQUIRED, float). The $x$-value of the graph.y_value
(REQUIRED, float). The $y$-value of the graph.assessment_card
","text":"An assessment_card
contains the following information.
name
(REQUIRED, string). The name of the assessment.date
(REQUIRED, string). The date at which the assessment is completed. Date should be given in ISO 8601 format, i.e. YYYY-MM-DD.contents
(list). There can be multiple items in contents. For each item the following fields are present:
question
(REQUIRED, string). A question.answer
(REQUIRED, string). An answer.remarks
(OPTIONAL, string). A field to put relevant discussion remarks in.authors
. There can be multiple names. For each name the following field is present.name
(OPTIONAL, string). The name of the author of the question.timestamp
(OPTIONAL, string). A timestamp of the date and time of the answer.version: {system_card_version} # Optional. Example: \"0.1a1\"\nname: {system_name} # Optional. Example: \"AangifteVertrekBuitenland\"\nupl: {upl_uri} # Optional. Example: https://standaarden.overheid.nl/owms/terms/AangifteVertrekBuitenland\nowners:\n- oin: {oin} # Optional. Example: 00000001003214345000\n organization: {organization_name} # Optional if oin is provided, Required otherwise. Example: BZK\n name: {owner_name} # Optional. Example: John Doe\n email: {owner_email} # Optional. Example: johndoe@email.com\n role: {owner_role} # Optional. Example: Data Scientist.\ndescription: {system_description} # Optional. Short description of the system.\nlabels: # Optional. Labels to store metadata about the system.\n- name: {label_name} # Optional.\n value: {label_value} # Optional.\nstatus: {system_status} # Optional. Example: \"production\".\npublication_category: {system_publication_cat} # Optional. Example: \"impactful_algorithm\".\nbegin_date: {system_begin_date} # Optional. Example: 2025-1-1.\nend_date: {system_end_date} # Optional. Example: 2025-12-1.\ngoal_and_impact: {system_goal_and_impact} # Optional. Goal and impact of the system.\nconsiderations: {system_considerations} # Optional. Considerations about the system.\nrisk_management: {system_risk_management} # Optional. Description of risks associated with the system.\nhuman_intervention: {system_human_intervention} # Optional. Description of human involvement in the system.\nlegal_base:\n- name: {law_name} # Optional. Example: \"AVG\".\n link: {law_uri} # Optional. Example: \"https://eur-lex.europa.eu/legal-content/NL/TXT/HTML/?uri=CELEX:31995L0046\".\nused_data: {system_used_data} # Optional. Description of the data used by the system.\ntechnical_design: {technical_design} # Optional. Description of the technical design of the system.\nexternal_providers:\n- {system_external_provider} # Optional. Reference to used external providers.\nreferences:\n- {reference_uri} # Optional. Example: URI to codebase.\n\nmodels:\n- !include {model_card_uri} # Optional. Example: cat_classifier_model.yaml.\n\nassessments:\n- !include {assessment_card_uri} # Required. Example: iama.yaml.\n
"},{"location":"projects/tad/reporting-standard/0.1a2/#model-card","title":"Model Card","text":"language:\n - {lang_0} # Optional. Example nl.\nlicense: {license} # Required. Example: Apache-2.0 or any license SPDX ID from https://opensource.org/license or \"other\".\nlicense_name: {license_name} # Optional if license != other, Required otherwise. Example: 'my-license-1.0'\nlicense_link: {license_link} # Optional if license != other, Required otherwise. Specify \"LICENSE\" or \"LICENSE.md\" to link to a file of that name inside the repo, or a URL to a remote file.\ntags:\n- {tag_0} # Optional. Example: audio\n- {tag_1} # Optional. Example: automatic-speech-recognition\nowners:\n- organization: {organization_name} # Required. Example: BZK\n oin: {oin} # Optional. Example: 00000001003214345000\n name: {owner_name} # Optional. Example: John Doe\n email: {owner_email} # Optional. Example: johndoe@email.com\n role: {owner_role} # Optional. Example: Data Scientist.\n\nmodel-index:\n- name: {model_id} # Required. Example: CatClassifier.\n model: {model_uri} # Required. URI to a repository containing the model file.\n artifacts:\n - uri: {model_artifact_uri} # Optional. Example: \"https://github.com/MinBZK/poc-kijkdoos-wasm-models/raw/main/logres_iris/logreg_iris.onnx\"\n - content-type: {model_artifact_type} # Optional. Example: \"application/onnx\".\n - md5-checksum: {md5_checksum} # Optional. Example: \"120EA8A25E5D487BF68B5F7096440019\"\n parameters:\n - name: {parameter_name} # Optional. Example: \"epochs\".\n dtype: {parameter_dtype} # Optional. Example: \"int\".\n value: {parameter_value} # Optional. Example: 100.\n labels:\n - name: {label_name} # Optional. Example: \"gender\".\n dtype: {label_type} # Optional. Example: \"string\".\n value: {label_value} # Optional. Example: \"female\".\n results:\n - task:\n type: {task_type} # Required. Example: image-classification.\n name: {task_name} # Optional. Example: Image Classification.\n datasets:\n - type: {dataset_type} # Required. Example: common_voice. Link to a repository containing the dataset\n name: {dataset_name} # Required. Example: \"Common Voice (French)\". A pretty name for the dataset.\n split: {split} # Optional. Example: \"train\".\n features:\n - {feature_name} # Optional. Example: \"gender\".\n revision: {dataset_version} # Optional. Example: 5503434ddd753f426f4b38109466949a1217c2bb\n metrics:\n - type: {metric_type} # Required. Example: false-positive-rate. Use metric id from https://hf.co/metrics.\n name: {metric_name} # Required. Example: \"FPR wrt class 0 restricted to feature gender:0 and age:21\".\n dtype: {metric_dtype} # Required. Example: \"float\".\n value: {metric_value} # Required. Example: 0.75.\n labels:\n - name: {label_name} # Optional. Example: \"gender\".\n type: {label_type} # Optional. Example: \"feature\".\n dtype: {label_type} # Optional. Example: \"string\".\n value: {label_value} # Optional. Example: \"female\".\n measurements:\n # Bar plots should be able to capture SHAP and Robustness Toolbox from AI Verify.\n bar_plots:\n - type: {measurement_type} # Required. Example: \"SHAP\".\n name: {measurement_name} # Optional. Example: \"Mean Absolute Shap Values\".\n results:\n - name: {bar_name} # Required. The name of a bar.\n value: {bar_value} # Required. The corresponding value.\n # Graph plots should be able to capture graph based measurements such as partial dependence and accumulated local effect.\n graph_plots:\n - type: {measurement_type} # Required. Example: \"partial_dependence\".\n name: {measurement_name} # Optional. Example: \"Partial Dependence Plot\".\n # Results store the graph plot data. So far all plots are dependent on a combination of a specific class (sometimes) and feature (always).\n # For example partial dependence plots are made for each feature and class.\n results:\n - class: {class_name} # Optional. Name of the output class the graph depends on.\n feature: {feature_name} # Required. Name of the feature the graph depends on.\n data:\n - x_value: {x_value} # Required. The x value of the graph data.\n y_value: {y_value} # Required. The y value of the graph data.\n
"},{"location":"projects/tad/reporting-standard/0.1a2/#assessment-card","title":"Assessment Card","text":"name: {assessment_name} # Required. Example: IAMA.\ndate: {assessment_date} # Required. Example: 25-03-2025.\ncontents:\n - question: {question_text} # Required. Example: \"Question 1: ...\".\n answer: {answer_text} # Required. Example: \"Answer: ...\".\n remarks: {remarks_text} # Optional. Example: \"Remarks: ...\".\n authors: # Optional. Example: \"['John', 'Peter']\".\n - name: {author_name}\n timestamp: {timestamp} # Optional. Example: 1711630721.\n
"},{"location":"projects/tad/reporting-standard/0.1a2/#schema","title":"Schema","text":"JSON schema will be added when we publish the first beta version.
"},{"location":"projects/tad/reporting-standard/0.1a2/#changelog","title":"Changelog","text":"Deviation from the Hugging Face specification is in the License field. Hugging Face only accepts dataset id's from Hugging Face license list while we accept any license from Open Source License List.\u00a0\u21a9\u21a9
Deviation from the Hugging Face specification is in the model_index:results:dataset
field. Hugging Face only accepts one dataset, while we accept a list of datasets.\u00a0\u21a9\u21a9
Deviation from the Hugging Face specification is in the Dataset Type field. Hugging Face only accepts dataset id's from Hugging Face datasets while we also allow for any url pointing to the dataset.\u00a0\u21a9\u21a9
For this extension to work relevant metrics (such as for example false positive rate) have to be added to the Hugging Face metrics, possibly this can be done in our organizational namespace.\u00a0\u21a9\u21a9
This document describes the Transparency of Algorithmic Decision making (TAD) Reporting Standard.
For reproducibility, governance, auditing and sharing of algorithmic systems it is essential to have a reporting standard so that information about an algorithmic system can be shared. This reporting standard describes how information about the different phases of an algorithm's life cycle can be reported. It contains, among other things, descriptive information combined with information about the technical tests and assessments applied.
Disclaimer
The TAD Reporting Standard is work in progress. This means that the current standard is probably suboptimal and will change significantly in future versions.
"},{"location":"projects/tad/reporting-standard/0.1a3/#introduction","title":"Introduction","text":"Inspired by Model Cards for Model Reporting and Papers with Code Model Index this standard almost 1 2 3 4 extends the Hugging Face model card metadata specification to allow for:
metrics_field
from the Hugging Face metadata specification.measurements
.assessments
.Following Hugging Face, this proposed standard will be written in yaml.
This standard does not contain all fields present in the Hugging Face metadata specification. The fields that are optional in the Hugging Face specification and are specific to the Hugging Face interface are omitted.
Another difference is that we divide our implementation into three separate parts.
system_card
, containing information about a group of ML-models which accomplish a specific task.model_card
, containing information about a specific data science model.assessment_card
, containing information about a regulatory assessment.Include statements
These model_card
s and assessment_card
s can be included verbatim into a system_card
, or referenced with an !include
statement, allowing for minimal cards to be compact in a single file. Extensive cards can be split up for readability and maintainability. Our standard allows for the !include
to be used anywhere.
The standard will be written in yaml. Example yaml files are given in the next section. The standard defines three cards: a system_card
, a model_card
and an assessment_card
. A system_card
contains information about an algorithmic system. It can have multiple models and each of these models should have a model_card
. Regulatory assessments can be processed in an assessment_card
. Note that model_card
's and assessment_card
's can be included directly into the system_card
or can be included as separate yaml files with help of a yaml-include mechanism. For clarity the latter is preferred and is also used in the examples in the next section.
system_card
","text":"A system_card
contains the following information.
schema_version
(REQUIRED, string). Version of the schema used, for example \"0.1a2\".name
(OPTIONAL, string). Name used to describe the system.upl
(OPTIONAL, string). If this algorithm is part of a product offered by the Dutch Government, it should contain a URI from the Uniform Product List.owners
(list). There can be multiple owners. For each owner the following fields are present.
oin
(OPTIONAL, string). If applicable the Organisatie-identificatienummer (OIN).organization
(OPTIONAL, string). Name of the organization that owns the model. If ion
is NOT provided this field is REQUIRED.name
(OPTIONAL, string). Name of a contact person within the organization.email
(OPTIONAL, string). Email address of the contact person or organization.role
(OPTIONAL, string). Role of the contact person. This field should only be set when the name
field is set.description
(OPTIONAL, string). A short description of the system.
labels
(OPTIONAL, list). This fields allows to store meta information about a system. There can be multiple labels. For each label the following fields are present.
name
(OPTIONAL, string). Name of the label.value
(OPTIONAL, string). Value of the label.status
(OPTIONAL, string). The status of the system. For example the status can be \"production\".
publication_category
(OPTIONAL, enum[string]). The publication category of the algorithm should be chosen from [\"high_risk\", other\"]
.begin_date
(OPTIONAL, string). The first date the system was used. Date should be given in ISO 8601 format, i.e. YYYY-MM-DD
.end_date
(OPTIONAL, string). The last date the system was used. Date should be given in ISO 8601 format, i.e. YYYY-MM-DD
.goal_and_impact
(OPTIONAL, string). The purpose of the system and the impact it has on citizens and companies.considerations
(OPTIONAL, string). The pro's and con's of using the system.risk_management
(OPTIONAL, string). Description of the risks associated with the system.human_intervention
(OPTIONAL, string). A description to want extend there is human involvement in the system.legal_base
(OPTIONAL, list). If there exists a legal base for the process the system is embedded in, this field can be filled in with the relevant laws. There can be multiple legal bases. For each legal base the following fields are present.name
(OPTIONAL, string). Name of the law.link
(OPTIONAL, string). URI pointing towards the contents of the law.used_data
(OPTIONAL, string). An overview of the data that is used in the system.technical_design
(OPTIONAL, string). Description on how the system works.external_providers
(OPTIONAL, list[string]). Name of an external provider, if relevant. There can be multiple external providers.references
(OPTIONAL, list[string]). Additional reference URI's that point information about the system and are relevant.models
(OPTIONAL, list[ModelCard]). A list of model cards (as defined below) or !include
s of a yaml file containing a model card. This model card can for example be a model card described in the next section or a model card from Hugging Face. There can be multiple model cards, meaning multiple models are used.assessments
(OPTIONAL, list[AssessmentCard]). A list of assessment cards (as defined below) or !include
s of a yaml file containing a assessment card. This assessment card is an assessment card described in the next section. There can be multiple assessment cards, meaning multiple assessment were performed.model_card
","text":"A model_card
contains the following information.
language
(OPTIONAL, list[string]). If relevant, the natural languages the model supports in ISO 639. There can be multiple languages.license
(REQUIRED, string). Any license from the open source license list 1. If the license is NOT present in the license list this field must be set to 'other' and the following two fields will be REQUIRED.
license_name
(string). An id for the license.license_link
(string). A link to a file of that name inside the repo, or a URL to a remote file containing the license contents.tags
(OPTIONAL, list[string]). Tags with keywords to describe the project. There can be multiple tags.
owners
(list). There can be multiple owners. For each owner the following fields are present.
oin
(OPTIONAL, string). If applicable the Organisatie-identificatienummer (OIN).organization
(OPTIONAL, string). Name of the organization that owns the model. If ion
is NOT provided this field is REQUIRED.name
(OPTIONAL, string). Name of a contact person within the organization.email
(OPTIONAL, string). Email address of the contact person or organization.role
(OPTIONAL, string). Role of the contact person. This field should only be set when the name
field is set.There can be multiple models. For each model the following fields are present.
name
(REQUIRED, string). The name of the model.model
(REQUIRED, string). A URI pointing to a repository containing the model file.artifacts
(OPTIONAL, list). A list of artifacts
uri
(OPTIONAL, string) URI refers to a relevant model artifactcontent-type
(OPTIONAL, string) Optional type, follow the Content-Type. Recognized values are \"application/onnx\"\", to refer to an ONNX representation of the model.md5-checksum
(OPTIONAL, string) Optional checksum for the content of the file.parameters
(list). There can be multiple parameters. For each parameter the following fields are present.
name
(REQUIRED, string). The name of the parameter, for example \"epochs\".dtype
(OPTIONAL, string). The datatype of the parameter, for example \"int\".value
(OPTIONAL, string). The value of the parameter, for example 100.labels
(list). This field allows to store meta information about a parameter. There can be multiple labels. For each label the following fields are present.
name
(OPTIONAL, string). The name of the label.dtype
(OPTIONAL, string). The datatype of the feature. If name
is set, this field is REQUIRED.value
(OPTIONAL, string). The value of the feature. If name
is set, this field is REQUIRED.results
(list). There can be multiple results. For each result the following fields are present.
task
(OPTIONAL, list).
task_type
(REQUIRED, string). The task of the model, for example \"object-classification\".task_name
(OPTIONAL, string). A pretty name for the model tasks, for example \"Object Classification\".datasets
(list). There can be multiple datasets 2. For each dataset the following fields are present.
type
(REQUIRED, string). The type of the dataset, can be a dataset id from Hugging Face datasets or any other link to a repository containing the dataset3, for example \"common_voice\".name
(REQUIRED, string). Name pretty name for the dataset, for example \"Common Voice (French)\".split
(OPTIONAL, string). The split of the dataset, for example \"train\".features
(OPTIONAL, list[string]). List of feature names.revision
(OPTIONAL, string). Version of the dataset, for example 5503434ddd753f426f4b38109466949a1217c2bb.metrics
(list). There can be multiple metrics. For each metric the following fields are present.
type
(REQUIRED, string). A metric-id from Hugging Face metrics4, for example accuracy.name
(REQUIRED, string). A descriptive name of the metric. For example \"false positive rate\" is not a descriptive name, but \"training false positive rate w.r.t class x\" is.dtype
(REQUIRED, string). The data type of the metric, for example float
.value
(REQUIRED, string). The value of the metric.labels
(list). This field allows to store meta information about a metric. For example, metrics can be computed for example on subgroups of specific features. For example, one can compute the accuracy for examples where the feature \"gender\" is set to \"male\". There can be multiple subgroups, which means that the metric is computed on the intersection of those subgroups. There can be multiple labels. For each label the following fields are present.
name
(OPTIONAL, string). The name of the feature. For example: \"gender\".type
(OPTIONAL, string). The type of the label. Can for example be set to \"feature\" or \"output_class\". If name
is set, this field is REQUIRED.dtype
(OPTIONAL, string). The datatype of the feature, for example float
. If name
is set, this field is REQUIRED.value
(OPTIONAL, string). The value of the feature. If name
is set, this field is REQUIRED. For example: \"male\".measurements
.
bar_plots
(list). The purpose of this field is to capture bar plot like measurements, for example SHAP values. There can be multiple bar plots. For each bar plot the following fields are present.
type
(REQUIRED, string). The type of bar plot, for example \"SHAP\".name
(OPTIONAL, string). A pretty name for the plot, for example \"Mean Absolute SHAP Values\".results
(list). The contents of the bar plot. A result represents a bar. There can be multiple results. For each result the following fields are present.name
(REQUIRED, string). The name of bar.value
(REQUIRED, float). The value of the corresponding bar.graph_plots
(list). The purpose of this field is to capture graph plot like measurements, such as partial dependence plots. There can be multiple graph plots. For each graph plot the following fields are present.
type
(REQUIRED, string). The type of the graph plot, for example \"partial_dependence\".name
(OPTIONAL, string). A pretty name of the graph, for example \"Partial Dependence Plot\".results
(list). Results contains the graph plot data. Each graph can depend on a specific output class and feature. There can be multiple results. For each result the following fields are present.class
(OPTIONAL, string/int/float/bool). The output class name that the graph corresponds to. This field is not always present.feature
(REQUIRED, string). The feature the graph corresponds to. This is required, since all relevant graphs are dependent on features.data
(list)x_value
(REQUIRED, float). The $x$-value of the graph.y_value
(REQUIRED, float). The $y$-value of the graph.assessment_card
","text":"An assessment_card
contains the following information.
name
(REQUIRED, string). The name of the assessment.date
(REQUIRED, string). The date at which the assessment is completed. Date should be given in ISO 8601 format, i.e. YYYY-MM-DD
.contents
(list). There can be multiple items in contents. For each item the following fields are present:
question
(REQUIRED, string). A question.answer
(REQUIRED, string). An answer.remarks
(OPTIONAL, string). A field to put relevant discussion remarks in.authors
. There can be multiple names. For each name the following field is present.name
(OPTIONAL, string). The name of the author of the question.timestamp
(OPTIONAL, string). A timestamp of the date, time and timezone of the answer. Timestamp should be given, preferably in UTC (represented as Z
), in ISO 8601 format, i.e. 2024-04-16T16:48:14Z
.version: {system_card_version} # Optional. Example: \"0.1a1\"\nname: {system_name} # Optional. Example: \"AangifteVertrekBuitenland\"\nupl: {upl_uri} # Optional. Example: https://standaarden.overheid.nl/owms/terms/AangifteVertrekBuitenland\nowners:\n- oin: {oin} # Optional. Example: 00000001003214345000\n organization: {organization_name} # Optional if oin is provided, Required otherwise. Example: BZK\n name: {owner_name} # Optional. Example: John Doe\n email: {owner_email} # Optional. Example: johndoe@email.com\n role: {owner_role} # Optional. Example: Data Scientist.\ndescription: {system_description} # Optional. Short description of the system.\nlabels: # Optional. Labels to store metadata about the system.\n- name: {label_name} # Optional.\n value: {label_value} # Optional.\nstatus: {system_status} # Optional. Example: \"production\".\npublication_category: {system_publication_cat} # Optional. Example: \"impactful_algorithm\".\nbegin_date: {system_begin_date} # Optional. Example: 2025-1-1.\nend_date: {system_end_date} # Optional. Example: 2025-12-1.\ngoal_and_impact: {system_goal_and_impact} # Optional. Goal and impact of the system.\nconsiderations: {system_considerations} # Optional. Considerations about the system.\nrisk_management: {system_risk_management} # Optional. Description of risks associated with the system.\nhuman_intervention: {system_human_intervention} # Optional. Description of human involvement in the system.\nlegal_base:\n- name: {law_name} # Optional. Example: \"AVG\".\n link: {law_uri} # Optional. Example: \"https://eur-lex.europa.eu/legal-content/NL/TXT/HTML/?uri=CELEX:31995L0046\".\nused_data: {system_used_data} # Optional. Description of the data used by the system.\ntechnical_design: {technical_design} # Optional. Description of the technical design of the system.\nexternal_providers:\n- {system_external_provider} # Optional. Reference to used external providers.\nreferences:\n- {reference_uri} # Optional. Example: URI to codebase.\n\nmodels:\n- !include {model_card_uri} # Optional. Example: cat_classifier_model.yaml.\n\nassessments:\n- !include {assessment_card_uri} # Required. Example: iama.yaml.\n
"},{"location":"projects/tad/reporting-standard/0.1a3/#model-card","title":"Model Card","text":"language:\n - {lang_0} # Optional. Example nl.\nlicense: {license} # Required. Example: Apache-2.0 or any license SPDX ID from https://opensource.org/license or \"other\".\nlicense_name: {license_name} # Optional if license != other, Required otherwise. Example: 'my-license-1.0'\nlicense_link: {license_link} # Optional if license != other, Required otherwise. Specify \"LICENSE\" or \"LICENSE.md\" to link to a file of that name inside the repo, or a URL to a remote file.\ntags:\n- {tag_0} # Optional. Example: audio\n- {tag_1} # Optional. Example: automatic-speech-recognition\nowners:\n- organization: {organization_name} # Required. Example: BZK\n oin: {oin} # Optional. Example: 00000001003214345000\n name: {owner_name} # Optional. Example: John Doe\n email: {owner_email} # Optional. Example: johndoe@email.com\n role: {owner_role} # Optional. Example: Data Scientist.\n\nmodel-index:\n- name: {model_id} # Required. Example: CatClassifier.\n model: {model_uri} # Required. URI to a repository containing the model file.\n artifacts:\n - uri: {model_artifact_uri} # Optional. Example: \"https://github.com/MinBZK/poc-kijkdoos-wasm-models/raw/main/logres_iris/logreg_iris.onnx\"\n - content-type: {model_artifact_type} # Optional. Example: \"application/onnx\".\n - md5-checksum: {md5_checksum} # Optional. Example: \"120EA8A25E5D487BF68B5F7096440019\"\n parameters:\n - name: {parameter_name} # Optional. Example: \"epochs\".\n dtype: {parameter_dtype} # Optional. Example: \"int\".\n value: {parameter_value} # Optional. Example: 100.\n labels:\n - name: {label_name} # Optional. Example: \"gender\".\n dtype: {label_type} # Optional. Example: \"string\".\n value: {label_value} # Optional. Example: \"female\".\n results:\n - task:\n type: {task_type} # Required. Example: image-classification.\n name: {task_name} # Optional. Example: Image Classification.\n datasets:\n - type: {dataset_type} # Required. Example: common_voice. Link to a repository containing the dataset\n name: {dataset_name} # Required. Example: \"Common Voice (French)\". A pretty name for the dataset.\n split: {split} # Optional. Example: \"train\".\n features:\n - {feature_name} # Optional. Example: \"gender\".\n revision: {dataset_version} # Optional. Example: 5503434ddd753f426f4b38109466949a1217c2bb\n metrics:\n - type: {metric_type} # Required. Example: false-positive-rate. Use metric id from https://hf.co/metrics.\n name: {metric_name} # Required. Example: \"FPR wrt class 0 restricted to feature gender:0 and age:21\".\n dtype: {metric_dtype} # Required. Example: \"float\".\n value: {metric_value} # Required. Example: 0.75.\n labels:\n - name: {label_name} # Optional. Example: \"gender\".\n type: {label_type} # Optional. Example: \"feature\".\n dtype: {label_type} # Optional. Example: \"string\".\n value: {label_value} # Optional. Example: \"female\".\n measurements:\n # Bar plots should be able to capture SHAP and Robustness Toolbox from AI Verify.\n bar_plots:\n - type: {measurement_type} # Required. Example: \"SHAP\".\n name: {measurement_name} # Optional. Example: \"Mean Absolute Shap Values\".\n results:\n - name: {bar_name} # Required. The name of a bar.\n value: {bar_value} # Required. The corresponding value.\n # Graph plots should be able to capture graph based measurements such as partial dependence and accumulated local effect.\n graph_plots:\n - type: {measurement_type} # Required. Example: \"partial_dependence\".\n name: {measurement_name} # Optional. Example: \"Partial Dependence Plot\".\n # Results store the graph plot data. So far all plots are dependent on a combination of a specific class (sometimes) and feature (always).\n # For example partial dependence plots are made for each feature and class.\n results:\n - class: {class_name} # Optional. Name of the output class the graph depends on.\n feature: {feature_name} # Required. Name of the feature the graph depends on.\n data:\n - x_value: {x_value} # Required. The x value of the graph data.\n y_value: {y_value} # Required. The y value of the graph data.\n
"},{"location":"projects/tad/reporting-standard/0.1a3/#assessment-card","title":"Assessment Card","text":"name: {assessment_name} # Required. Example: IAMA.\ndate: {assessment_date} # Required. Example: 25-03-2025.\ncontents:\n - question: {question_text} # Required. Example: \"Question 1: ...\".\n answer: {answer_text} # Required. Example: \"Answer: ...\".\n remarks: {remarks_text} # Optional. Example: \"Remarks: ...\".\n authors: # Optional. Example: \"['John', 'Peter']\".\n - name: {author_name}\n timestamp: {timestamp} # Optional. Example: 2024-04-16T16:48:14Z.\n
"},{"location":"projects/tad/reporting-standard/0.1a3/#schema","title":"Schema","text":"JSON schema will be added when we publish the first beta version.
"},{"location":"projects/tad/reporting-standard/0.1a3/#changelog","title":"Changelog","text":"Deviation from the Hugging Face specification is in the License field. Hugging Face only accepts dataset id's from Hugging Face license list while we accept any license from Open Source License List.\u00a0\u21a9\u21a9
Deviation from the Hugging Face specification is in the model_index:results:dataset
field. Hugging Face only accepts one dataset, while we accept a list of datasets.\u00a0\u21a9\u21a9
Deviation from the Hugging Face specification is in the Dataset Type field. Hugging Face only accepts dataset id's from Hugging Face datasets while we also allow for any url pointing to the dataset.\u00a0\u21a9\u21a9
For this extension to work relevant metrics (such as for example false positive rate) have to be added to the Hugging Face metrics, possibly this can be done in our organizational namespace.\u00a0\u21a9\u21a9
This document describes the Transparency of Algorithmic Decision making (TAD) Reporting Standard.
For reproducibility, governance, auditing and sharing of algorithmic systems it is essential to have a reporting standard so that information about an algorithmic system can be shared. This reporting standard describes how information about the different phases of an algorithm's life cycle can be reported. It contains, among other things, descriptive information combined with information about the technical tests and assessments applied.
Disclaimer
The TAD Reporting Standard is work in progress. This means that the current standard is probably suboptimal and will change significantly in future versions.
"},{"location":"projects/tad/reporting-standard/0.1a4/#introduction","title":"Introduction","text":"Inspired by Model Cards for Model Reporting and Papers with Code Model Index this standard almost 1 2 3 4 extends the Hugging Face model card metadata specification to allow for:
metrics_field
from the Hugging Face metadata specification.measurements
.assessments
.Following Hugging Face, this proposed standard will be written in yaml.
This standard does not contain all fields present in the Hugging Face metadata specification. The fields that are optional in the Hugging Face specification and are specific to the Hugging Face interface are omitted.
Another difference is that we divide our implementation into three separate parts.
system_card
, containing information about a group of ML-models which accomplish a specific task.model_card
, containing information about a specific data science model.assessment_card
, containing information about a regulatory assessment.Include statements
These model_card
s and assessment_card
s can be included verbatim into a system_card
, or referenced with an !include
statement, allowing for minimal cards to be compact in a single file. Extensive cards can be split up for readability and maintainability. Our standard allows for the !include
to be used anywhere.
The standard will be written in yaml. Example yaml files are given in the next section. The standard defines three cards: a system_card
, a model_card
and an assessment_card
. A system_card
contains information about an algorithmic system. It can have multiple models and each of these models should have a model_card
. Regulatory assessments can be processed in an assessment_card
. Note that model_card
's and assessment_card
's can be included directly into the system_card
or can be included as separate yaml files with help of a yaml-include mechanism. For clarity the latter is preferred and is also used in the examples in the next section.
system_card
","text":"A system_card
contains the following information.
schema_version
(REQUIRED, string). Version of the schema used, for example \"0.1a2\".provenance
(OPTIONAL). In case this System Card is generated from another source file, this field can capture the historical context of the contents of this System Card.
git_commit_hash
(OPTIONAL, string). Git commit hash of the commit which contains the transformation file used to create this card.timestamp
(OPTIONAL, string). A timestamp of the date, time and timezone of generation of this System Card. Timestamp should be given, preferably in UTC (represented as Z
), in ISO 8601 format, i.e. 2024-04-16T16:48:14Z
.uri
(OPTIONAL, string). URI to the tool that was used to perform the transformations.author
(OPTIONAL, string). Name of person that initiated the transformations.name
(OPTIONAL, string). Name used to describe the system.
upl
(OPTIONAL, string). If this algorithm is part of a product offered by the Dutch Government, it should contain a URI from the Uniform Product List.owners
(list). There can be multiple owners. For each owner the following fields are present.
oin
(OPTIONAL, string). If applicable the Organisatie-identificatienummer (OIN).organization
(OPTIONAL, string). Name of the organization that owns the model. If ion
is NOT provided this field is REQUIRED.name
(OPTIONAL, string). Name of a contact person within the organization.email
(OPTIONAL, string). Email address of the contact person or organization.role
(OPTIONAL, string). Role of the contact person. This field should only be set when the name
field is set.description
(OPTIONAL, string). A short description of the system.
labels
(OPTIONAL, list). This fields allows to store meta information about a system. There can be multiple labels. For each label the following fields are present.
name
(OPTIONAL, string). Name of the label.value
(OPTIONAL, string). Value of the label.status
(OPTIONAL, string). The status of the system. For example the status can be \"production\".
publication_category
(OPTIONAL, enum[string]). The publication category of the algorithm should be chosen from [\"high_risk\", other\"]
.begin_date
(OPTIONAL, string). The first date the system was used. Date should be given in ISO 8601 format, i.e. YYYY-MM-DD
.end_date
(OPTIONAL, string). The last date the system was used. Date should be given in ISO 8601 format, i.e. YYYY-MM-DD
.goal_and_impact
(OPTIONAL, string). The purpose of the system and the impact it has on citizens and companies.considerations
(OPTIONAL, string). The pro's and con's of using the system.risk_management
(OPTIONAL, string). Description of the risks associated with the system.human_intervention
(OPTIONAL, string). A description to want extend there is human involvement in the system.legal_base
(OPTIONAL, list). If there exists a legal base for the process the system is embedded in, this field can be filled in with the relevant laws. There can be multiple legal bases. For each legal base the following fields are present.name
(OPTIONAL, string). Name of the law.link
(OPTIONAL, string). URI pointing towards the contents of the law.used_data
(OPTIONAL, string). An overview of the data that is used in the system.technical_design
(OPTIONAL, string). Description on how the system works.external_providers
(OPTIONAL, list[string]). Name of an external provider, if relevant. There can be multiple external providers.references
(OPTIONAL, list[string]). Additional reference URI's that point information about the system and are relevant.models
(OPTIONAL, list[ModelCard]). A list of model cards (as defined below) or !include
s of a yaml file containing a model card. This model card can for example be a model card described in the next section or a model card from Hugging Face. There can be multiple model cards, meaning multiple models are used.assessments
(OPTIONAL, list[AssessmentCard]). A list of assessment cards (as defined below) or !include
s of a yaml file containing a assessment card. This assessment card is an assessment card described in the next section. There can be multiple assessment cards, meaning multiple assessment were performed.model_card
","text":"A model_card
contains the following information.
provenance
(OPTIONAL). In case this Model Card is generated from another source file, this field can capture the historical context of the contents of this Model Card.
git_commit_hash
(OPTIONAL, string). Git commit hash of the commit which contains the transformation file used to create this card.timestamp
(OPTIONAL, string). A timestamp of the date, time and timezone of generation of this System Card. Timestamp should be given, preferably in UTC (represented as Z
), in ISO 8601 format, i.e. 2024-04-16T16:48:14Z
.uri
(OPTIONAL, string). URI to the tool that was used to perform the transformations.author
(OPTIONAL, string). Name of person that initiated the transformations.language
(OPTIONAL, list[string]). If relevant, the natural languages the model supports in ISO 639. There can be multiple languages.
license
(REQUIRED, string). Any license from the open source license list 1. If the license is NOT present in the license list this field must be set to 'other' and the following two fields will be REQUIRED.
license_name
(string). An id for the license.license_link
(string). A link to a file of that name inside the repo, or a URL to a remote file containing the license contents.tags
(OPTIONAL, list[string]). Tags with keywords to describe the project. There can be multiple tags.
owners
(list). There can be multiple owners. For each owner the following fields are present.
oin
(OPTIONAL, string). If applicable the Organisatie-identificatienummer (OIN).organization
(OPTIONAL, string). Name of the organization that owns the model. If ion
is NOT provided this field is REQUIRED.name
(OPTIONAL, string). Name of a contact person within the organization.email
(OPTIONAL, string). Email address of the contact person or organization.role
(OPTIONAL, string). Role of the contact person. This field should only be set when the name
field is set.There can be multiple models. For each model the following fields are present.
name
(REQUIRED, string). The name of the model.model
(REQUIRED, string). A URI pointing to a repository containing the model file.artifacts
(OPTIONAL, list). A list of artifacts
uri
(OPTIONAL, string) URI refers to a relevant model artifactcontent-type
(OPTIONAL, string) Optional type, follow the Content-Type. Recognized values are \"application/onnx\"\", to refer to an ONNX representation of the model.md5-checksum
(OPTIONAL, string) Optional checksum for the content of the file.parameters
(list). There can be multiple parameters. For each parameter the following fields are present.
name
(REQUIRED, string). The name of the parameter, for example \"epochs\".dtype
(OPTIONAL, string). The datatype of the parameter, for example \"int\".value
(OPTIONAL, string). The value of the parameter, for example 100.labels
(list). This field allows to store meta information about a parameter. There can be multiple labels. For each label the following fields are present.
name
(OPTIONAL, string). The name of the label.dtype
(OPTIONAL, string). The datatype of the feature. If name
is set, this field is REQUIRED.value
(OPTIONAL, string). The value of the feature. If name
is set, this field is REQUIRED.results
(list). There can be multiple results. For each result the following fields are present.
task
(OPTIONAL, list).
task_type
(REQUIRED, string). The task of the model, for example \"object-classification\".task_name
(OPTIONAL, string). A pretty name for the model tasks, for example \"Object Classification\".datasets
(list). There can be multiple datasets 2. For each dataset the following fields are present.
type
(REQUIRED, string). The type of the dataset, can be a dataset id from Hugging Face datasets or any other link to a repository containing the dataset3, for example \"common_voice\".name
(REQUIRED, string). Name pretty name for the dataset, for example \"Common Voice (French)\".split
(OPTIONAL, string). The split of the dataset, for example \"train\".features
(OPTIONAL, list[string]). List of feature names.revision
(OPTIONAL, string). Version of the dataset, for example 5503434ddd753f426f4b38109466949a1217c2bb.metrics
(list). There can be multiple metrics. For each metric the following fields are present.
type
(REQUIRED, string). A metric-id from Hugging Face metrics4, for example accuracy.name
(REQUIRED, string). A descriptive name of the metric. For example \"false positive rate\" is not a descriptive name, but \"training false positive rate w.r.t class x\" is.dtype
(REQUIRED, string). The data type of the metric, for example float
.value
(REQUIRED, string). The value of the metric.labels
(list). This field allows to store meta information about a metric. For example, metrics can be computed for example on subgroups of specific features. For example, one can compute the accuracy for examples where the feature \"gender\" is set to \"male\". There can be multiple subgroups, which means that the metric is computed on the intersection of those subgroups. There can be multiple labels. For each label the following fields are present.
name
(OPTIONAL, string). The name of the feature. For example: \"gender\".type
(OPTIONAL, string). The type of the label. Can for example be set to \"feature\" or \"output_class\". If name
is set, this field is REQUIRED.dtype
(OPTIONAL, string). The datatype of the feature, for example float
. If name
is set, this field is REQUIRED.value
(OPTIONAL, string). The value of the feature. If name
is set, this field is REQUIRED. For example: \"male\".measurements
.
bar_plots
(list). The purpose of this field is to capture bar plot like measurements, for example SHAP values. There can be multiple bar plots. For each bar plot the following fields are present.
type
(REQUIRED, string). The type of bar plot, for example \"SHAP\".name
(OPTIONAL, string). A pretty name for the plot, for example \"Mean Absolute SHAP Values\".results
(list). The contents of the bar plot. A result represents a bar. There can be multiple results. For each result the following fields are present.name
(REQUIRED, string). The name of bar.value
(REQUIRED, float). The value of the corresponding bar.graph_plots
(list). The purpose of this field is to capture graph plot like measurements, such as partial dependence plots. There can be multiple graph plots. For each graph plot the following fields are present.
type
(REQUIRED, string). The type of the graph plot, for example \"partial_dependence\".name
(OPTIONAL, string). A pretty name of the graph, for example \"Partial Dependence Plot\".results
(list). Results contains the graph plot data. Each graph can depend on a specific output class and feature. There can be multiple results. For each result the following fields are present.class
(OPTIONAL, string/int/float/bool). The output class name that the graph corresponds to. This field is not always present.feature
(REQUIRED, string). The feature the graph corresponds to. This is required, since all relevant graphs are dependent on features.data
(list)x_value
(REQUIRED, float). The $x$-value of the graph.y_value
(REQUIRED, float). The $y$-value of the graph.assessment_card
","text":"An assessment_card
contains the following information.
provenance
(OPTIONAL). In case this Assessment Card is generated from another source file, this field can capture the historical context of the contents of this Assessment Card.
git_commit_hash
(OPTIONAL, string). Git commit hash of the commit which contains the transformation file used to create this card.timestamp
(OPTIONAL, string). A timestamp of the date, time and timezone of generation of this System Card. Timestamp should be given, preferably in UTC (represented as Z
), in ISO 8601 format, i.e. 2024-04-16T16:48:14Z
.uri
(OPTIONAL, string). URI to the tool that was used to perform the transformations.author
(OPTIONAL, string). Name of person that initiated the transformations.name
(REQUIRED, string). The name of the assessment.
date
(REQUIRED, string). The date at which the assessment is completed. Date should be given in ISO 8601 format, i.e. YYYY-MM-DD
.contents
(list). There can be multiple items in contents. For each item the following fields are present:
question
(REQUIRED, string). A question.answer
(REQUIRED, string). An answer.remarks
(OPTIONAL, string). A field to put relevant discussion remarks in.authors
. There can be multiple names. For each name the following field is present.name
(OPTIONAL, string). The name of the author of the question.timestamp
(OPTIONAL, string). A timestamp of the date, time and timezone of the answer. Timestamp should be given, preferably in UTC (represented as Z
), in ISO 8601 format, i.e. 2024-04-16T16:48:14Z
.version: {system_card_version} # Optional. Example: \"0.1a1\"\nprovenance: # Optional.\n git_commit_hash: {git_commit_hash} # Optional. Example: 5503434ddd753f426f4b38109466949a1217c2bb\n timestamp: {modification_timestamp} # Optional. Example: 2024-04-16T16:48:14Z.\n uri: {modification_uri} # Optional. Example: https://github.com/MinBZK/tad-conversion-tool\n author: {modification_author} # Optional. Example: John Doe\nname: {system_name} # Optional. Example: \"AangifteVertrekBuitenland\"\nupl: {upl_uri} # Optional. Example: https://standaarden.overheid.nl/owms/terms/AangifteVertrekBuitenland\nowners:\n- oin: {oin} # Optional. Example: 00000001003214345000\n organization: {organization_name} # Optional if oin is provided, Required otherwise. Example: BZK\n name: {owner_name} # Optional. Example: John Doe\n email: {owner_email} # Optional. Example: johndoe@email.com\n role: {owner_role} # Optional. Example: Data Scientist.\ndescription: {system_description} # Optional. Short description of the system.\nlabels: # Optional. Labels to store metadata about the system.\n- name: {label_name} # Optional.\n value: {label_value} # Optional.\nstatus: {system_status} # Optional. Example: \"production\".\npublication_category: {system_publication_cat} # Optional. Example: \"impactful_algorithm\".\nbegin_date: {system_begin_date} # Optional. Example: 2025-1-1.\nend_date: {system_end_date} # Optional. Example: 2025-12-1.\ngoal_and_impact: {system_goal_and_impact} # Optional. Goal and impact of the system.\nconsiderations: {system_considerations} # Optional. Considerations about the system.\nrisk_management: {system_risk_management} # Optional. Description of risks associated with the system.\nhuman_intervention: {system_human_intervention} # Optional. Description of human involvement in the system.\nlegal_base:\n- name: {law_name} # Optional. Example: \"AVG\".\n link: {law_uri} # Optional. Example: \"https://eur-lex.europa.eu/legal-content/NL/TXT/HTML/?uri=CELEX:31995L0046\".\nused_data: {system_used_data} # Optional. Description of the data used by the system.\ntechnical_design: {technical_design} # Optional. Description of the technical design of the system.\nexternal_providers:\n- {system_external_provider} # Optional. Reference to used external providers.\nreferences:\n- {reference_uri} # Optional. Example: URI to codebase.\n\nmodels:\n- !include {model_card_uri} # Optional. Example: cat_classifier_model.yaml.\n\nassessments:\n- !include {assessment_card_uri} # Required. Example: iama.yaml.\n
"},{"location":"projects/tad/reporting-standard/0.1a4/#model-card","title":"Model Card","text":"provenance: # Optional.\n git_commit_hash: {git_commit_hash} # Optional. Example: 5503434ddd753f426f4b38109466949a1217c2bb\n timestamp: {modification_timestamp} # Optional. Example: 2024-04-16T16:48:14Z.\n uri: {modification_uri} # Optional. Example: https://github.com/MinBZK/tad-conversion-tool\n author: {modification_author} # Optional. Example: John Doe\nlanguage:\n - {lang_0} # Optional. Example nl.\nlicense: {license} # Required. Example: Apache-2.0 or any license SPDX ID from https://opensource.org/license or \"other\".\nlicense_name: {license_name} # Optional if license != other, Required otherwise. Example: 'my-license-1.0'\nlicense_link: {license_link} # Optional if license != other, Required otherwise. Specify \"LICENSE\" or \"LICENSE.md\" to link to a file of that name inside the repo, or a URL to a remote file.\ntags:\n- {tag_0} # Optional. Example: audio\n- {tag_1} # Optional. Example: automatic-speech-recognition\nowners:\n- organization: {organization_name} # Required. Example: BZK\n oin: {oin} # Optional. Example: 00000001003214345000\n name: {owner_name} # Optional. Example: John Doe\n email: {owner_email} # Optional. Example: johndoe@email.com\n role: {owner_role} # Optional. Example: Data Scientist.\n\nmodel-index:\n- name: {model_id} # Required. Example: CatClassifier.\n model: {model_uri} # Required. URI to a repository containing the model file.\n artifacts:\n - uri: {model_artifact_uri} # Optional. Example: \"https://github.com/MinBZK/poc-kijkdoos-wasm-models/raw/main/logres_iris/logreg_iris.onnx\"\n - content-type: {model_artifact_type} # Optional. Example: \"application/onnx\".\n - md5-checksum: {md5_checksum} # Optional. Example: \"120EA8A25E5D487BF68B5F7096440019\"\n parameters:\n - name: {parameter_name} # Optional. Example: \"epochs\".\n dtype: {parameter_dtype} # Optional. Example: \"int\".\n value: {parameter_value} # Optional. Example: 100.\n labels:\n - name: {label_name} # Optional. Example: \"gender\".\n dtype: {label_type} # Optional. Example: \"string\".\n value: {label_value} # Optional. Example: \"female\".\n results:\n - task:\n type: {task_type} # Required. Example: image-classification.\n name: {task_name} # Optional. Example: Image Classification.\n datasets:\n - type: {dataset_type} # Required. Example: common_voice. Link to a repository containing the dataset\n name: {dataset_name} # Required. Example: \"Common Voice (French)\". A pretty name for the dataset.\n split: {split} # Optional. Example: \"train\".\n features:\n - {feature_name} # Optional. Example: \"gender\".\n revision: {dataset_version} # Optional. Example: 5503434ddd753f426f4b38109466949a1217c2bb\n metrics:\n - type: {metric_type} # Required. Example: false-positive-rate. Use metric id from https://hf.co/metrics.\n name: {metric_name} # Required. Example: \"FPR wrt class 0 restricted to feature gender:0 and age:21\".\n dtype: {metric_dtype} # Required. Example: \"float\".\n value: {metric_value} # Required. Example: 0.75.\n labels:\n - name: {label_name} # Optional. Example: \"gender\".\n type: {label_type} # Optional. Example: \"feature\".\n dtype: {label_type} # Optional. Example: \"string\".\n value: {label_value} # Optional. Example: \"female\".\n measurements:\n # Bar plots should be able to capture SHAP and Robustness Toolbox from AI Verify.\n bar_plots:\n - type: {measurement_type} # Required. Example: \"SHAP\".\n name: {measurement_name} # Optional. Example: \"Mean Absolute Shap Values\".\n results:\n - name: {bar_name} # Required. The name of a bar.\n value: {bar_value} # Required. The corresponding value.\n # Graph plots should be able to capture graph based measurements such as partial dependence and accumulated local effect.\n graph_plots:\n - type: {measurement_type} # Required. Example: \"partial_dependence\".\n name: {measurement_name} # Optional. Example: \"Partial Dependence Plot\".\n # Results store the graph plot data. So far all plots are dependent on a combination of a specific class (sometimes) and feature (always).\n # For example partial dependence plots are made for each feature and class.\n results:\n - class: {class_name} # Optional. Name of the output class the graph depends on.\n feature: {feature_name} # Required. Name of the feature the graph depends on.\n data:\n - x_value: {x_value} # Required. The x value of the graph data.\n y_value: {y_value} # Required. The y value of the graph data.\n
"},{"location":"projects/tad/reporting-standard/0.1a4/#assessment-card","title":"Assessment Card","text":"provenance: # Optional.\n git_commit_hash: {git_commit_hash} # Optional. Example: 5503434ddd753f426f4b38109466949a1217c2bb\n timestamp: {modification_timestamp} # Optional. Example: 2024-04-16T16:48:14Z.\n uri: {modification_uri} # Optional. Example: https://github.com/MinBZK/tad-conversion-tool\n author: {modification_author} # Optional. Example: John Doe\nname: {assessment_name} # Required. Example: IAMA.\ndate: {assessment_date} # Required. Example: 25-03-2025.\ncontents:\n - question: {question_text} # Required. Example: \"Question 1: ...\".\n answer: {answer_text} # Required. Example: \"Answer: ...\".\n remarks: {remarks_text} # Optional. Example: \"Remarks: ...\".\n authors: # Optional. Example: \"['John', 'Peter']\".\n - name: {author_name}\n timestamp: {timestamp} # Optional. Example: 2024-04-16T16:48:14Z.\n
"},{"location":"projects/tad/reporting-standard/0.1a4/#schema","title":"Schema","text":"JSON schema will be added when we publish the first beta version.
"},{"location":"projects/tad/reporting-standard/0.1a4/#changelog","title":"Changelog","text":"Deviation from the Hugging Face specification is in the License field. Hugging Face only accepts dataset id's from Hugging Face license list while we accept any license from Open Source License List.\u00a0\u21a9\u21a9
Deviation from the Hugging Face specification is in the model_index:results:dataset
field. Hugging Face only accepts one dataset, while we accept a list of datasets.\u00a0\u21a9\u21a9
Deviation from the Hugging Face specification is in the Dataset Type field. Hugging Face only accepts dataset id's from Hugging Face datasets while we also allow for any url pointing to the dataset.\u00a0\u21a9\u21a9
For this extension to work relevant metrics (such as for example false positive rate) have to be added to the Hugging Face metrics, possibly this can be done in our organizational namespace.\u00a0\u21a9\u21a9
This document describes the Transparency of Algorithmic Decision making (TAD) Reporting Standard.
For reproducibility, governance, auditing and sharing of algorithmic systems it is essential to have a reporting standard so that information about an algorithmic system can be shared. This reporting standard describes how information about the different phases of an algorithm's life cycle can be reported. It contains, among other things, descriptive information combined with information about the technical tests and assessments applied.
Disclaimer
The TAD Reporting Standard is work in progress. This means that the current standard is probably suboptimal and will change significantly in future versions.
"},{"location":"projects/tad/reporting-standard/0.1a5/#introduction","title":"Introduction","text":"Inspired by Model Cards for Model Reporting and Papers with Code Model Index this standard almost 1 2 3 4 extends the Hugging Face model card metadata specification to allow for:
metrics_field
from the Hugging Face metadata specification.measurements
.assessments
.Following Hugging Face, this proposed standard will be written in yaml.
This standard does not contain all fields present in the Hugging Face metadata specification. The fields that are optional in the Hugging Face specification and are specific to the Hugging Face interface are omitted.
Another difference is that we divide our implementation into three separate parts.
system_card
, containing information about a group of ML-models which accomplish a specific task.model_card
, containing information about a specific data science model.assessment_card
, containing information about a regulatory assessment.Include statements
These model_card
s and assessment_card
s can be included verbatim into a system_card
, or referenced with an !include
statement, allowing for minimal cards to be compact in a single file. Extensive cards can be split up for readability and maintainability. Our standard allows for the !include
to be used anywhere.
The standard will be written in yaml. Example yaml files are given in the next section. The standard defines three cards: a system_card
, a model_card
and an assessment_card
. A system_card
contains information about an algorithmic system. It can have multiple models and each of these models should have a model_card
. Regulatory assessments can be processed in an assessment_card
. Note that model_card
's and assessment_card
's can be included directly into the system_card
or can be included as separate yaml files with help of a yaml-include mechanism. For clarity the latter is preferred and is also used in the examples in the next section.
system_card
","text":"A system_card
contains the following information.
schema_version
(REQUIRED, string). Version of the schema used, for example \"0.1a2\".provenance
(OPTIONAL). In case this System Card is generated from another source file, this field can capture the historical context of the contents of this System Card.
git_commit_hash
(OPTIONAL, string). Git commit hash of the commit which contains the transformation file used to create this card.timestamp
(OPTIONAL, string). A timestamp of the date, time and timezone of generation of this System Card. Timestamp should be given, preferably in UTC (represented as Z
), in ISO 8601 format, i.e. 2024-04-16T16:48:14Z
.uri
(OPTIONAL, string). URI to the tool that was used to perform the transformations.author
(OPTIONAL, string). Name of person that initiated the transformations.name
(OPTIONAL, string). Name used to describe the system.
upl
(OPTIONAL, string). If this algorithm is part of a product offered by the Dutch Government, it should contain a URI from the Uniform Product List.owners
(list). There can be multiple owners. For each owner the following fields are present.
oin
(OPTIONAL, string). If applicable the Organisatie-identificatienummer (OIN).organization
(OPTIONAL, string). Name of the organization that owns the model. If ion
is NOT provided this field is REQUIRED.name
(OPTIONAL, string). Name of a contact person within the organization.email
(OPTIONAL, string). Email address of the contact person or organization.role
(OPTIONAL, string). Role of the contact person. This field should only be set when the name
field is set.description
(OPTIONAL, string). A short description of the system.
labels
(OPTIONAL, list). This fields allows to store meta information about a system. There can be multiple labels. For each label the following fields are present.
name
(OPTIONAL, string). Name of the label.value
(OPTIONAL, string). Value of the label.status
(OPTIONAL, string). The status of the system. For example the status can be \"production\".
publication_category
(OPTIONAL, enum[string]). The publication category of the algorithm should be chosen from [\"high_risk\", other\"]
.begin_date
(OPTIONAL, string). The first date the system was used. Date should be given in ISO 8601 format, i.e. YYYY-MM-DD
.end_date
(OPTIONAL, string). The last date the system was used. Date should be given in ISO 8601 format, i.e. YYYY-MM-DD
.goal_and_impact
(OPTIONAL, string). The purpose of the system and the impact it has on citizens and companies.considerations
(OPTIONAL, string). The pro's and con's of using the system.risk_management
(OPTIONAL, string). Description of the risks associated with the system.human_intervention
(OPTIONAL, string). A description to want extend there is human involvement in the system.legal_base
(OPTIONAL, list). If there exists a legal base for the process the system is embedded in, this field can be filled in with the relevant laws. There can be multiple legal bases. For each legal base the following fields are present.name
(OPTIONAL, string). Name of the law.link
(OPTIONAL, string). URI pointing towards the contents of the law.used_data
(OPTIONAL, string). An overview of the data that is used in the system.technical_design
(OPTIONAL, string). Description on how the system works.external_providers
(OPTIONAL, list). If relevant, these fields allow to store information on external providers. There can be multiple external providers.name
(OPTIONAL, string). Name of the external provider.version
(OPTIONAL, string). Version of the external provider reflecting its relation to previous versions.references
(OPTIONAL, list[string]). Additional reference URI's that point information about the system and are relevant.interaction_details
(OPTIONAL, list[string]). Explain how the AI system interacts with hardware or software, including other AI systems, or how the AI system can be used to interact with hardware or software.version_requirements
(OPTIONAL, list[string]). Describe the versions of the relevant software or firmware, and any requirements related to version updates.deployment_variants
(OPTIONAL, list[string]). Description of all the forms in which the AI system is placed on the market or put into service, such as software packages embedded into hardware, downloads, or APIs.hardware_requirements
(OPTIONAL, list[string]). Provide a description of the hardware on which the AI system must be run.product_markings
(OPTIONAL, list[string]). If the AI system is a component of products, photos, or illustrations, describe the external features, markings, and internal layout of those products.user_interface
(OPTIONAL, list). Provide information on the user interface provided to the user responsible for its operation.description
(OPTIONAL, string). A description of the provided user interface.link
(OPTIONAL, string). A link to the user interface can be included.snapshot
(OPTIONAL, string). A snapshot/screenshot of the user interface can be included with the use of a hyperlink.models
(OPTIONAL, list[ModelCard]). A list of model cards (as defined below) or !include
s of a yaml file containing a model card. This model card can for example be a model card described in the next section or a model card from Hugging Face. There can be multiple model cards, meaning multiple models are used.assessments
(OPTIONAL, list[AssessmentCard]). A list of assessment cards (as defined below) or !include
s of a yaml file containing a assessment card. This assessment card is an assessment card described in the next section. There can be multiple assessment cards, meaning multiple assessment were performed.model_card
","text":"A model_card
contains the following information.
provenance
(OPTIONAL). In case this Model Card is generated from another source file, this field can capture the historical context of the contents of this Model Card.
git_commit_hash
(OPTIONAL, string). Git commit hash of the commit which contains the transformation file used to create this card.timestamp
(OPTIONAL, string). A timestamp of the date, time and timezone of generation of this System Card. Timestamp should be given, preferably in UTC (represented as Z
), in ISO 8601 format, i.e. 2024-04-16T16:48:14Z
.uri
(OPTIONAL, string). URI to the tool that was used to perform the transformations.author
(OPTIONAL, string). Name of person that initiated the transformations.language
(OPTIONAL, list[string]). If relevant, the natural languages the model supports in ISO 639. There can be multiple languages.
license
(REQUIRED, string). Any license from the open source license list 1. If the license is NOT present in the license list this field must be set to 'other' and the following two fields will be REQUIRED.
license_name
(string). An id for the license.license_link
(string). A link to a file of that name inside the repo, or a URL to a remote file containing the license contents.tags
(OPTIONAL, list[string]). Tags with keywords to describe the project. There can be multiple tags.
owners
(list). There can be multiple owners. For each owner the following fields are present.
oin
(OPTIONAL, string). If applicable the Organisatie-identificatienummer (OIN).organization
(OPTIONAL, string). Name of the organization that owns the model. If ion
is NOT provided this field is REQUIRED.name
(OPTIONAL, string). Name of a contact person within the organization.email
(OPTIONAL, string). Email address of the contact person or organization.role
(OPTIONAL, string). Role of the contact person. This field should only be set when the name
field is set.There can be multiple models. For each model the following fields are present.
name
(REQUIRED, string). The name of the model.model
(REQUIRED, string). A URI pointing to a repository containing the model file.artifacts
(OPTIONAL, list). A list of artifacts
uri
(OPTIONAL, string) URI refers to a relevant model artifactcontent-type
(OPTIONAL, string) Optional type, follow the Content-Type. Recognized values are \"application/onnx\"\", to refer to an ONNX representation of the model.md5-checksum
(OPTIONAL, string) Optional checksum for the content of the file.parameters
(list). There can be multiple parameters. For each parameter the following fields are present.
name
(REQUIRED, string). The name of the parameter, for example \"epochs\".dtype
(OPTIONAL, string). The datatype of the parameter, for example \"int\".value
(OPTIONAL, string). The value of the parameter, for example 100.labels
(list). This field allows to store meta information about a parameter. There can be multiple labels. For each label the following fields are present.
name
(OPTIONAL, string). The name of the label.dtype
(OPTIONAL, string). The datatype of the feature. If name
is set, this field is REQUIRED.value
(OPTIONAL, string). The value of the feature. If name
is set, this field is REQUIRED.results
(list). There can be multiple results. For each result the following fields are present.
task
(OPTIONAL, list).
task_type
(REQUIRED, string). The task of the model, for example \"object-classification\".task_name
(OPTIONAL, string). A pretty name for the model tasks, for example \"Object Classification\".datasets
(list). There can be multiple datasets 2. For each dataset the following fields are present.
type
(REQUIRED, string). The type of the dataset, can be a dataset id from Hugging Face datasets or any other link to a repository containing the dataset3, for example \"common_voice\".name
(REQUIRED, string). Name pretty name for the dataset, for example \"Common Voice (French)\".split
(OPTIONAL, string). The split of the dataset, for example \"train\".features
(OPTIONAL, list[string]). List of feature names.revision
(OPTIONAL, string). Version of the dataset, for example 5503434ddd753f426f4b38109466949a1217c2bb.metrics
(list). There can be multiple metrics. For each metric the following fields are present.
type
(REQUIRED, string). A metric-id from Hugging Face metrics4, for example accuracy.name
(REQUIRED, string). A descriptive name of the metric. For example \"false positive rate\" is not a descriptive name, but \"training false positive rate w.r.t class x\" is.dtype
(REQUIRED, string). The data type of the metric, for example float
.value
(REQUIRED, string). The value of the metric.labels
(list). This field allows to store meta information about a metric. For example, metrics can be computed for example on subgroups of specific features. For example, one can compute the accuracy for examples where the feature \"gender\" is set to \"male\". There can be multiple subgroups, which means that the metric is computed on the intersection of those subgroups. There can be multiple labels. For each label the following fields are present.
name
(OPTIONAL, string). The name of the feature. For example: \"gender\".type
(OPTIONAL, string). The type of the label. Can for example be set to \"feature\" or \"output_class\". If name
is set, this field is REQUIRED.dtype
(OPTIONAL, string). The datatype of the feature, for example float
. If name
is set, this field is REQUIRED.value
(OPTIONAL, string). The value of the feature. If name
is set, this field is REQUIRED. For example: \"male\".measurements
.
bar_plots
(list). The purpose of this field is to capture bar plot like measurements, for example SHAP values. There can be multiple bar plots. For each bar plot the following fields are present.
type
(REQUIRED, string). The type of bar plot, for example \"SHAP\".name
(OPTIONAL, string). A pretty name for the plot, for example \"Mean Absolute SHAP Values\".results
(list). The contents of the bar plot. A result represents a bar. There can be multiple results. For each result the following fields are present.name
(REQUIRED, string). The name of bar.value
(REQUIRED, float). The value of the corresponding bar.graph_plots
(list). The purpose of this field is to capture graph plot like measurements, such as partial dependence plots. There can be multiple graph plots. For each graph plot the following fields are present.
type
(REQUIRED, string). The type of the graph plot, for example \"partial_dependence\".name
(OPTIONAL, string). A pretty name of the graph, for example \"Partial Dependence Plot\".results
(list). Results contains the graph plot data. Each graph can depend on a specific output class and feature. There can be multiple results. For each result the following fields are present.class
(OPTIONAL, string/int/float/bool). The output class name that the graph corresponds to. This field is not always present.feature
(REQUIRED, string). The feature the graph corresponds to. This is required, since all relevant graphs are dependent on features.data
(list)x_value
(REQUIRED, float). The $x$-value of the graph.y_value
(REQUIRED, float). The $y$-value of the graph.assessment_card
","text":"An assessment_card
contains the following information.
provenance
(OPTIONAL). In case this Assessment Card is generated from another source file, this field can capture the historical context of the contents of this Assessment Card.
git_commit_hash
(OPTIONAL, string). Git commit hash of the commit which contains the transformation file used to create this card.timestamp
(OPTIONAL, string). A timestamp of the date, time and timezone of generation of this System Card. Timestamp should be given, preferably in UTC (represented as Z
), in ISO 8601 format, i.e. 2024-04-16T16:48:14Z
.uri
(OPTIONAL, string). URI to the tool that was used to perform the transformations.author
(OPTIONAL, string). Name of person that initiated the transformations.name
(REQUIRED, string). The name of the assessment.
date
(REQUIRED, string). The date at which the assessment is completed. Date should be given in ISO 8601 format, i.e. YYYY-MM-DD
.contents
(list). There can be multiple items in contents. For each item the following fields are present:
question
(REQUIRED, string). A question.answer
(REQUIRED, string). An answer.remarks
(OPTIONAL, string). A field to put relevant discussion remarks in.authors
. There can be multiple names. For each name the following field is present.name
(OPTIONAL, string). The name of the author of the question.timestamp
(OPTIONAL, string). A timestamp of the date, time and timezone of the answer. Timestamp should be given, preferably in UTC (represented as Z
), in ISO 8601 format, i.e. 2024-04-16T16:48:14Z
.version: {system_card_version} # Optional. Example: \"0.1a1\"\nprovenance: # Optional.\n git_commit_hash: {git_commit_hash} # Optional. Example: 5503434ddd753f426f4b38109466949a1217c2bb\n timestamp: {modification_timestamp} # Optional. Example: 2024-04-16T16:48:14Z.\n uri: {modification_uri} # Optional. Example: https://github.com/MinBZK/tad-conversion-tool\n author: {modification_author} # Optional. Example: John Doe\nname: {system_name} # Optional. Example: \"AangifteVertrekBuitenland\"\nupl: {upl_uri} # Optional. Example: https://standaarden.overheid.nl/owms/terms/AangifteVertrekBuitenland\nowners:\n- oin: {oin} # Optional. Example: 00000001003214345000\n organization: {organization_name} # Optional if oin is provided, Required otherwise. Example: BZK\n name: {owner_name} # Optional. Example: John Doe\n email: {owner_email} # Optional. Example: johndoe@email.com\n role: {owner_role} # Optional. Example: Data Scientist.\ndescription: {system_description} # Optional. Short description of the system.\nlabels: # Optional. Labels to store metadata about the system.\n- name: {label_name} # Optional.\n value: {label_value} # Optional.\nstatus: {system_status} # Optional. Example: \"production\".\npublication_category: {system_publication_cat} # Optional. Example: \"impactful_algorithm\".\nbegin_date: {system_begin_date} # Optional. Example: 2025-1-1.\nend_date: {system_end_date} # Optional. Example: 2025-12-1.\ngoal_and_impact: {system_goal_and_impact} # Optional. Goal and impact of the system.\nconsiderations: {system_considerations} # Optional. Considerations about the system.\nrisk_management: {system_risk_management} # Optional. Description of risks associated with the system.\nhuman_intervention: {system_human_intervention} # Optional. Description of human involvement in the system.\nlegal_base:\n- name: {law_name} # Optional. Example: \"AVG\".\n link: {law_uri} # Optional. Example: \"https://eur-lex.europa.eu/legal-content/NL/TXT/HTML/?uri=CELEX:31995L0046\".\nused_data: {system_used_data} # Optional. Description of the data used by the system.\ntechnical_design: {technical_design} # Optional. Description of the technical design of the system.\nexternal_providers:\n- name: {name_external_provider} # Optional. Reference to used external providers.\n version: {version_external_provider} # Optional. Version used of the external provider.\nreferences:\n- {reference_uri} # Optional. Example: URI to codebase.\ninteraction_details:\n- {system_interaction_details} # Optional. Example: \"GPS modules for location tracking\"\nversion_requirements:\n- {system_version_requirements} # Optional. Example: \">version2.1\"\ndeployment_variants:\n- {system_deployment_variants} # Optional. Example: \"Web Application\"\nhardware_requirements:\n- {system_hardware_requirements} # Optional. Example: \"8 cores, 16 threads CPU\"\nproduct_markings:\n- {system_product_markings} # Optional. Example: \"Model number in the info menu\"\nuser_interface:\n- description: {system_user_interface} # Optional. Example: \"web-based dashboard\"\n link: {system_user_interface_uri} # Optional. Example: \"http://example.com/content\"\n snapshot: {system_user_interface_snapshot_uri} # Optional. Example: \"http://example.com/snapshot.png\"\n\nmodels:\n- !include {model_card_uri} # Optional. Example: cat_classifier_model.yaml.\n\nassessments:\n- !include {assessment_card_uri} # Required. Example: iama.yaml.\n
"},{"location":"projects/tad/reporting-standard/0.1a5/#model-card","title":"Model Card","text":"provenance: # Optional.\n git_commit_hash: {git_commit_hash} # Optional. Example: 5503434ddd753f426f4b38109466949a1217c2bb\n timestamp: {modification_timestamp} # Optional. Example: 2024-04-16T16:48:14Z.\n uri: {modification_uri} # Optional. Example: https://github.com/MinBZK/tad-conversion-tool\n author: {modification_author} # Optional. Example: John Doe\nlanguage:\n - {lang_0} # Optional. Example nl.\nlicense: {license} # Required. Example: Apache-2.0 or any license SPDX ID from https://opensource.org/license or \"other\".\nlicense_name: {license_name} # Optional if license != other, Required otherwise. Example: 'my-license-1.0'\nlicense_link: {license_link} # Optional if license != other, Required otherwise. Specify \"LICENSE\" or \"LICENSE.md\" to link to a file of that name inside the repo, or a URL to a remote file.\ntags:\n- {tag_0} # Optional. Example: audio\n- {tag_1} # Optional. Example: automatic-speech-recognition\nowners:\n- organization: {organization_name} # Required. Example: BZK\n oin: {oin} # Optional. Example: 00000001003214345000\n name: {owner_name} # Optional. Example: John Doe\n email: {owner_email} # Optional. Example: johndoe@email.com\n role: {owner_role} # Optional. Example: Data Scientist.\n\nmodel-index:\n- name: {model_id} # Required. Example: CatClassifier.\n model: {model_uri} # Required. URI to a repository containing the model file.\n artifacts:\n - uri: {model_artifact_uri} # Optional. Example: \"https://github.com/MinBZK/poc-kijkdoos-wasm-models/raw/main/logres_iris/logreg_iris.onnx\"\n - content-type: {model_artifact_type} # Optional. Example: \"application/onnx\".\n - md5-checksum: {md5_checksum} # Optional. Example: \"120EA8A25E5D487BF68B5F7096440019\"\n parameters:\n - name: {parameter_name} # Optional. Example: \"epochs\".\n dtype: {parameter_dtype} # Optional. Example: \"int\".\n value: {parameter_value} # Optional. Example: 100.\n labels:\n - name: {label_name} # Optional. Example: \"gender\".\n dtype: {label_type} # Optional. Example: \"string\".\n value: {label_value} # Optional. Example: \"female\".\n results:\n - task:\n type: {task_type} # Required. Example: image-classification.\n name: {task_name} # Optional. Example: Image Classification.\n datasets:\n - type: {dataset_type} # Required. Example: common_voice. Link to a repository containing the dataset\n name: {dataset_name} # Required. Example: \"Common Voice (French)\". A pretty name for the dataset.\n split: {split} # Optional. Example: \"train\".\n features:\n - {feature_name} # Optional. Example: \"gender\".\n revision: {dataset_version} # Optional. Example: 5503434ddd753f426f4b38109466949a1217c2bb\n metrics:\n - type: {metric_type} # Required. Example: false-positive-rate. Use metric id from https://hf.co/metrics.\n name: {metric_name} # Required. Example: \"FPR wrt class 0 restricted to feature gender:0 and age:21\".\n dtype: {metric_dtype} # Required. Example: \"float\".\n value: {metric_value} # Required. Example: 0.75.\n labels:\n - name: {label_name} # Optional. Example: \"gender\".\n type: {label_type} # Optional. Example: \"feature\".\n dtype: {label_type} # Optional. Example: \"string\".\n value: {label_value} # Optional. Example: \"female\".\n measurements:\n # Bar plots should be able to capture SHAP and Robustness Toolbox from AI Verify.\n bar_plots:\n - type: {measurement_type} # Required. Example: \"SHAP\".\n name: {measurement_name} # Optional. Example: \"Mean Absolute Shap Values\".\n results:\n - name: {bar_name} # Required. The name of a bar.\n value: {bar_value} # Required. The corresponding value.\n # Graph plots should be able to capture graph based measurements such as partial dependence and accumulated local effect.\n graph_plots:\n - type: {measurement_type} # Required. Example: \"partial_dependence\".\n name: {measurement_name} # Optional. Example: \"Partial Dependence Plot\".\n # Results store the graph plot data. So far all plots are dependent on a combination of a specific class (sometimes) and feature (always).\n # For example partial dependence plots are made for each feature and class.\n results:\n - class: {class_name} # Optional. Name of the output class the graph depends on.\n feature: {feature_name} # Required. Name of the feature the graph depends on.\n data:\n - x_value: {x_value} # Required. The x value of the graph data.\n y_value: {y_value} # Required. The y value of the graph data.\n
"},{"location":"projects/tad/reporting-standard/0.1a5/#assessment-card","title":"Assessment Card","text":"provenance: # Optional.\n git_commit_hash: {git_commit_hash} # Optional. Example: 5503434ddd753f426f4b38109466949a1217c2bb\n timestamp: {modification_timestamp} # Optional. Example: 2024-04-16T16:48:14Z.\n uri: {modification_uri} # Optional. Example: https://github.com/MinBZK/tad-conversion-tool\n author: {modification_author} # Optional. Example: John Doe\nname: {assessment_name} # Required. Example: IAMA.\ndate: {assessment_date} # Required. Example: 25-03-2025.\ncontents:\n - question: {question_text} # Required. Example: \"Question 1: ...\".\n answer: {answer_text} # Required. Example: \"Answer: ...\".\n remarks: {remarks_text} # Optional. Example: \"Remarks: ...\".\n authors: # Optional. Example: \"['John', 'Peter']\".\n - name: {author_name}\n timestamp: {timestamp} # Optional. Example: 2024-04-16T16:48:14Z.\n
"},{"location":"projects/tad/reporting-standard/0.1a5/#schema","title":"Schema","text":"JSON schema will be added when we publish the first beta version.
"},{"location":"projects/tad/reporting-standard/0.1a5/#changelog","title":"Changelog","text":"Deviation from the Hugging Face specification is in the License field. Hugging Face only accepts dataset id's from Hugging Face license list while we accept any license from Open Source License List.\u00a0\u21a9\u21a9
Deviation from the Hugging Face specification is in the model_index:results:dataset
field. Hugging Face only accepts one dataset, while we accept a list of datasets.\u00a0\u21a9\u21a9
Deviation from the Hugging Face specification is in the Dataset Type field. Hugging Face only accepts dataset id's from Hugging Face datasets while we also allow for any url pointing to the dataset.\u00a0\u21a9\u21a9
For this extension to work relevant metrics (such as for example false positive rate) have to be added to the Hugging Face metrics, possibly this can be done in our organizational namespace.\u00a0\u21a9\u21a9
This document describes the Transparency of Algorithmic Decision making (TAD) Reporting Standard.
For reproducibility, governance, auditing and sharing of algorithmic systems it is essential to have a reporting standard so that information about an algorithmic system can be shared. This reporting standard describes how information about the different phases of an algorithm's life cycle can be reported. It contains, among other things, descriptive information combined with information about the technical tests and assessments applied.
Disclaimer
The TAD Reporting Standard is work in progress. This means that the current standard is probably suboptimal and will change significantly in future versions.
"},{"location":"projects/tad/reporting-standard/0.1a6/#introduction","title":"Introduction","text":"Inspired by Model Cards for Model Reporting and Papers with Code Model Index this standard almost1 2 3 4 extends the Hugging Face model card metadata specification to allow for:
metrics_field
from the Hugging Face metadata specification.measurements
.assessments
.Following Hugging Face, this proposed standard will be written in YAML.
This standard does not contain all fields present in the Hugging Face metadata specification. The fields that are optional in the Hugging Face specification and are specific to the Hugging Face interface are omitted.
Another difference is that we divide our implementation into three separate parts.
system_card
, containing information about a group of ML-models which accomplish a specific task.model_card
, containing information about a specific data science model.assessment_card
, containing information about a regulatory assessment.Include statements
These model_card
s and assessment_card
s can be included verbatim into a system_card
, or referenced with an !include
statement, allowing for minimal cards to be compact in a single file. Extensive cards can be split up for readability and maintainability. Our standard allows for the !include
to be used anywhere.
The standard will be written in YAML. Example YAML files are given in the next section. The standard defines three cards: a system_card
, a model_card
and an assessment_card
. A system_card
contains information about an algorithmic system. It can have multiple models and each of these models should have a model_card
. Regulatory assessments can be processed in an assessment_card
. Note that model_card
's and assessment_card
's can be included directly into the system_card
or can be included as separate YAML files with help of a YAML-include mechanism. For clarity the latter is preferred and is also used in the examples in the next section.
system_card
","text":"A system_card
contains the following information.
schema_version
(REQUIRED, string). Version of the schema used, for example \"0.1a2\".provenance
(OPTIONAL). In case this System Card is generated from another source file, this field can capture the historical context of the contents of this System Card.
git_commit_hash
(OPTIONAL, string). Git commit hash of the commit which contains the transformation file used to create this card.timestamp
(OPTIONAL, string). A timestamp of the date, time and timezone of generation of this System Card. Timestamp should be given, preferably in UTC (represented as Z
), in ISO 8601 format, i.e. 2024-04-16T16:48:14Z
.uri
(OPTIONAL, string). URI to the tool that was used to perform the transformations.author
(OPTIONAL, string). Name of person that initiated the transformations.name
(OPTIONAL, string). Name used to describe the system.
upl
(OPTIONAL, string). If this algorithm is part of a product offered by the Dutch Government, it should contain a URI from the Uniform Product List.owners
(OPTIONAL, list). There can be multiple owners. For each owner the following fields are present.
oin
(OPTIONAL, string). If applicable the Organisatie-identificatienummer (OIN).organization
(OPTIONAL, string). Name of the organization that owns the model. If ion
is NOT provided this field is REQUIRED.name
(OPTIONAL, string). Name of a contact person within the organization.email
(OPTIONAL, string). Email address of the contact person or organization.role
(OPTIONAL, string). Role of the contact person. This field should only be set when the name
field is set.description
(OPTIONAL, string). A short description of the system.
labels
(OPTIONAL, list). This fields allows to store meta information about a system. There can be multiple labels. For each label the following fields are present.
name
(OPTIONAL, string). Name of the label.value
(OPTIONAL, string). Value of the label.status
(OPTIONAL, string). The status of the system. For example the status can be \"production\".
publication_category
(OPTIONAL, enum[string]). The publication category of the algorithm should be chosen from [\"high_risk\", other\"]
.begin_date
(OPTIONAL, string). The first date the system was used. Date should be given in ISO 8601 format, i.e. YYYY-MM-DD
.end_date
(OPTIONAL, string). The last date the system was used. Date should be given in ISO 8601 format, i.e. YYYY-MM-DD
.goal_and_impact
(OPTIONAL, string). The purpose of the system and the impact it has on citizens and companies.considerations
(OPTIONAL, string). The pro's and con's of using the system.risk_management
(OPTIONAL, string). Description of the risks associated with the system.human_intervention
(OPTIONAL, string). A description to want extend there is human involvement in the system.legal_base
(OPTIONAL, list). If there exists a legal base for the process the system is embedded in, this field can be filled in with the relevant laws. There can be multiple legal bases. For each legal base the following fields are present.
name
(OPTIONAL, string). Name of the law.link
(OPTIONAL, string). URI pointing towards the contents of the law.used_data
(OPTIONAL, string). An overview of the data that is used in the system.
technical_design
(OPTIONAL, string). Description on how the system works.external_providers
(OPTIONAL, list). If relevant, these fields allow to store information on external providers. There can be multiple external providers.
name
(OPTIONAL, string). Name of the external provider.version
(OPTIONAL, string). Version of the external provider reflecting its relation to previous versions.references
(OPTIONAL, list[string]). Additional reference URI's that point information about the system and are relevant.
interaction_details
(OPTIONAL, list[string]). Explain how the AI system interacts with hardware or software, including other AI systems, or how the AI system can be used to interact with hardware or software.version_requirements
(OPTIONAL, list[string]). Describe the versions of the relevant software or firmware, and any requirements related to version updates.deployment_variants
(OPTIONAL, list[string]). Description of all the forms in which the AI system is placed on the market or put into service, such as software packages embedded into hardware, downloads, or APIs.hardware_requirements
(OPTIONAL, list[string]). Provide a description of the hardware on which the AI system must be run.product_markings
(OPTIONAL, list[string]). If the AI system is a component of products, photos, or illustrations, describe the external features, markings, and internal layout of those products.user_interface
(OPTIONAL, list). Provide information on the user interface provided to the user responsible for its operation.
description
(OPTIONAL, string). A description of the provided user interface.link
(OPTIONAL, string). A link to the user interface can be included.snapshot
(OPTIONAL, string). A snapshot/screenshot of the user interface can be included with the use of a hyperlink.models
(OPTIONAL, list[ModelCard]). A list of model cards (as defined below) or !include
s of a YAML file containing a model card. This model card can for example be a model card described in the next section or a model card from Hugging Face. There can be multiple model cards, meaning multiple models are used.
assessments
(OPTIONAL, list[AssessmentCard]). A list of assessment cards (as defined below) or !include
s of a YAML file containing a assessment card. This assessment card is an assessment card described in the next section. There can be multiple assessment cards, meaning multiple assessment were performed.
model_card
","text":"A model_card
contains the following information.
provenance
(OPTIONAL). In case this Model Card is generated from another source file, this field can capture the historical context of the contents of this Model Card.
git_commit_hash
(OPTIONAL, string). Git commit hash of the commit which contains the transformation file used to create this card.timestamp
(OPTIONAL, string). A timestamp of the date, time and timezone of generation of this System Card. Timestamp should be given, preferably in UTC (represented as Z
), in ISO 8601 format, i.e. 2024-04-16T16:48:14Z
.uri
(OPTIONAL, string). URI to the tool that was used to perform the transformations.author
(OPTIONAL, string). Name of person that initiated the transformations.language
(OPTIONAL, list[string]). If relevant, the natural languages the model supports in ISO 639. There can be multiple languages.
license
(REQUIRED).
license_name
(REQUIRED, string). Any license from the open source license list1. If the license is NOT present in the license list this field must be set to 'other' and the following two fields will be REQUIRED.license_link
(OPTIONAL, string). A link to a file of that name inside the repo, or a URL to a remote file containing the license contents.tags
(OPTIONAL, list[string]). Tags with keywords to describe the project. There can be multiple tags.
owners
(OPTIONAL, list). There can be multiple owners. For each owner the following fields are present.
oin
(OPTIONAL, string). If applicable the Organisatie-identificatienummer (OIN).organization
(OPTIONAL, string). Name of the organization that owns the model. If ion
is NOT provided this field is REQUIRED.name
(OPTIONAL, string). Name of a contact person within the organization.email
(OPTIONAL, string). Email address of the contact person or organization.role
(OPTIONAL, string). Role of the contact person. This field should only be set when the name
field is set.model_index
(REQUIRED, list). There can be multiple models. For each model the following fields are present.
name
(REQUIRED, string). The name of the model.model
(REQUIRED, string). A URI pointing to a repository containing the model file.artifacts
(OPTIONAL, list). A list of artifacts
uri
(OPTIONAL, string) URI refers to a relevant model artifactcontent-type
(OPTIONAL, string) Optional type, follow the Content-Type. Recognized values are \"application/onnx\"\", to refer to an ONNX representation of the model.md5-checksum
(OPTIONAL, string) Optional checksum for the content of the file.parameters
(OPTIONAL, list). There can be multiple parameters. For each parameter the following fields are present.
name
(REQUIRED, string). The name of the parameter, for example \"epochs\".dtype
(OPTIONAL, string). The datatype of the parameter, for example \"int\".value
(OPTIONAL, string). The value of the parameter, for example 100.labels
(OPTIONAL, list). This field allows to store meta information about a parameter. There can be multiple labels. For each label the following fields are present.
name
(OPTIONAL, string). The name of the label.dtype
(OPTIONAL, string). The datatype of the feature. If name
is set, this field is REQUIRED.value
(OPTIONAL, string). The value of the feature. If name
is set, this field is REQUIRED.results
(OPTIONAL, list). There can be multiple results. For each result the following fields are present.
task
(OPTIONAL, list).
task_type
(REQUIRED, string). The task of the model, for example \"object-classification\".task_name
(OPTIONAL, string). A pretty name for the model tasks, for example \"Object Classification\".datasets
(OPTIONAL, list). There can be multiple datasets 2. For each dataset the following fields are present.
type
(REQUIRED, string). The type of the dataset, can be a dataset id from Hugging Face datasets or any other link to a repository containing the dataset3, for example \"common_voice\".name
(REQUIRED, string). Name pretty name for the dataset, for example \"Common Voice (French)\".split
(OPTIONAL, string). The split of the dataset, for example \"train\".features
(OPTIONAL, list[string]). List of feature names.revision
(OPTIONAL, string). Version of the dataset, for example \"5503434ddd753f426f4b38109466949a1217c2bb\".metrics
(OPTIONAL, list). There can be multiple metrics. For each metric the following fields are present.
type
(REQUIRED, string). A metric-id from Hugging Face metrics4, for example accuracy.name
(REQUIRED, string). A descriptive name of the metric. For example \"false positive rate\" is not a descriptive name, but \"training false positive rate w.r.t class x\" is.dtype
(REQUIRED, string). The data type of the metric, for example float
.value
(REQUIRED, string). The value of the metric.labels
(OPTIONAL, list). This field allows to store meta information about a metric. For example, metrics can be computed for example on subgroups of specific features. For example, one can compute the accuracy for examples where the feature \"gender\" is set to \"male\". There can be multiple subgroups, which means that the metric is computed on the intersection of those subgroups. There can be multiple labels. For each label the following fields are present.
name
(OPTIONAL, string). The name of the feature. For example: \"gender\".type
(OPTIONAL, string). The type of the label. Can for example be set to \"feature\" or \"output_class\". If name
is set, this field is REQUIRED.dtype
(OPTIONAL, string). The datatype of the feature, for example float
. If name
is set, this field is REQUIRED.value
(OPTIONAL, string). The value of the feature. If name
is set, this field is REQUIRED. For example: \"male\".measurements
.
bar_plots
(OPTIONAL, list). The purpose of this field is to capture bar plot like measurements, for example SHAP values. There can be multiple bar plots. For each bar plot the following fields are present.
type
(REQUIRED, string). The type of bar plot, for example \"SHAP\".name
(OPTIONAL, string). A pretty name for the plot, for example \"Mean Absolute SHAP Values\".results
(REQUIRED, list). The contents of the bar plot. A result represents a bar. There can be multiple results. For each result the following fields are present.
name
(REQUIRED, string). The name of bar.value
(REQUIRED, float). The value of the corresponding bar.graph_plots
(OPTIONAL, list). The purpose of this field is to capture graph plot like measurements, such as partial dependence plots. There can be multiple graph plots. For each graph plot the following fields are present.
type
(REQUIRED, string). The type of the graph plot, for example \"partial_dependence\".name
(OPTIONAL, string). A pretty name of the graph, for example \"Partial Dependence Plot\".results
(REQUIRED, list). Results contains the graph plot data. Each graph can depend on a specific output class and feature. There can be multiple results. For each result the following fields are present.
class
(OPTIONAL, string/int/float/bool). The output class name that the graph corresponds to. This field is not always present.feature
(REQUIRED, string). The feature the graph corresponds to. This is required, since all relevant graphs are dependent on features.data
(REQUIRED, list)
x_value
(REQUIRED, float). The $x$-value of the graph.y_value
(REQUIRED, float). The $y$-value of the graph.assessment_card
","text":"An assessment_card
contains the following information.
provenance
(OPTIONAL). In case this Assessment Card is generated from another source file, this field can capture the historical context of the contents of this Assessment Card.
git_commit_hash
(OPTIONAL, string). Git commit hash of the commit which contains the transformation file used to create this card.timestamp
(OPTIONAL, string). A timestamp of the date, time and timezone of generation of this System Card. Timestamp should be given, preferably in UTC (represented as Z
), in ISO 8601 format, i.e. 2024-04-16T16:48:14Z
.uri
(OPTIONAL, string). URI to the tool that was used to perform the transformations.author
(OPTIONAL, string). Name of person that initiated the transformations.name
(REQUIRED, string). The name of the assessment.
date
(REQUIRED, string). The date at which the assessment is completed. Date should be given in ISO 8601 format, i.e. YYYY-MM-DD
.contents
(REQUIRED, list). There can be multiple items in contents. For each item the following fields are present:
question
(REQUIRED, string). A question.answer
(REQUIRED, string). An answer.remarks
(OPTIONAL, string). A field to put relevant discussion remarks in.authors
(OPTIONAL, list). There can be multiple names. For each name the following field is present.
name
(OPTIONAL, string). The name of the author of the question.timestamp
(OPTIONAL, string). A timestamp of the date, time and timezone of the answer. Timestamp should be given, preferably in UTC (represented as Z
), in ISO 8601 format, i.e. 2024-04-16T16:48:14Z
.
version: {system_card_version}\nprovenance:\n git_commit_hash: {git_commit_hash}\n timestamp: {modification_timestamp}\n uri: {modification_uri}\n author: {modification_author}\nname: {system_name}\nupl: {upl_uri}\nowners:\n - oin: {oin}\n organization: {organization_name}\n name: {owner_name}\n email: {owner_email}\n role: {owner_role}\ndescription: {system_description}\nlabels:\n - name: {label_name}\n value: {label_value}\nstatus: {system_status}\npublication_category: {system_publication_cat}\nbegin_date: {system_begin_date}\nend_date: {system_end_date}\ngoal_and_impact: {system_goal_and_impact}\nconsiderations: {system_considerations}\nrisk_management: {system_risk_management}\nhuman_intervention: {system_human_intervention}\nlegal_base:\n - name: {law_name}\n link: {law_uri}\nused_data: {system_used_data}\ntechnical_design: {technical_design}\nexternal_providers:\n - name: {name_external_provider}\n version: {version_external_provider}\nreferences:\n - {reference_uri}\ninteraction_details:\n - {system_interaction_details}\nversion_requirements:\n - {system_version_requirements}\ndeployment_variants:\n - {system_deployment_variants}\nhardware_requirements:\n - {system_hardware_requirements}\nproduct_markings:\n - {system_product_markings}\nuser_interface:\n - description: {system_user_interface}\n link: {system_user_interface_uri}\n snapshot: {system_user_interface_snapshot_uri}\n\nmodels:\n - !include {model_card_uri}\n\nassessments:\n - !include {assessment_card_uri}\n
"},{"location":"projects/tad/reporting-standard/0.1a6/#model-card","title":"Model Card","text":"provenance:\n git_commit_hash: {git_commit_hash}\n timestamp: {modification_timestamp}\n uri: {modification_uri}\n author: {modification_author}\nlanguage:\n - {lang_0}\nlicense:\n license_name: {license_name}\n license_link: {license_uri}\ntags:\n - {tag_0}\nowners:\n - oin: {oin}\n organization: {organization_name}\n name: {owner_name}\n email: {owner_email}\n role: {owner_role}\n\nmodel-index:\n - name: {model_id}\n model: {model_uri}\n artifacts:\n - uri: {model_artifact_uri}\n - content-type: {model_artifact_type}\n - md5-checksum: {md5_checksum}\n parameters:\n - name: {parameter_name}\n dtype: {parameter_dtype}\n value: {parameter_value}\n labels:\n - name: {label_name}\n dtype: {label_type}\n value: {label_value}\n results:\n - task:\n - type: {task_type}\n name: {task_name}\n datasets:\n - type: {dataset_type}\n name: {dataset_name}\n split: {split}\n features:\n - {feature_name}\n revision: {dataset_version}\n metrics:\n - type: {metric_type}\n name: {metric_name}\n dtype: {metric_dtype}\n value: {metric_value}\n labels:\n - name: {label_name}\n type: {label_type}\n dtype: {label_type}\n value: {label_value}\n measurements:\n bar_plots:\n - type: {measurement_type}\n name: {measurement_name}\n results:\n - name: {bar_name}\n value: {bar_value}\n graph_plots:\n - type: {measurement_type}\n name: {measurement_name}\n results:\n - class: {class_name}\n feature: {feature_name}\n data:\n - x_value: {x_value}\n y_value: {y_value}\n
"},{"location":"projects/tad/reporting-standard/0.1a6/#assessment-card","title":"Assessment Card","text":"provenance:\n git_commit_hash: {git_commit_hash}\n timestamp: {modification_timestamp}\n uri: {modification_uri}\n author: {modification_author}\nname: {assessment_name}\ndate: {assessment_date}\ncontents:\n - question: {question_text}\n answer: {answer_text}\n remarks: {remarks_text}\n authors:\n - name: {author_name}\n timestamp: {timestamp}\n
"},{"location":"projects/tad/reporting-standard/0.1a6/#schema","title":"Schema","text":"JSON schema will be added when we publish the first beta version.
"},{"location":"projects/tad/reporting-standard/0.1a6/#changelog","title":"Changelog","text":"Deviation from the Hugging Face specification is in the License field. Hugging Face only accepts dataset id's from Hugging Face license list while we accept any license from Open Source License List.\u00a0\u21a9\u21a9
Deviation from the Hugging Face specification is in the model_index:results:dataset
field. Hugging Face only accepts one dataset, while we accept a list of datasets.\u00a0\u21a9\u21a9
Deviation from the Hugging Face specification is in the Dataset Type field. Hugging Face only accepts dataset id's from Hugging Face datasets while we also allow for any url pointing to the dataset.\u00a0\u21a9\u21a9
For this extension to work relevant metrics (such as for example false positive rate) have to be added to the Hugging Face metrics, possibly this can be done in our organizational namespace.\u00a0\u21a9\u21a9
This document describes the Transparency of Algorithmic Decision making (TAD) Reporting Standard.
For reproducibility, governance, auditing and sharing of algorithmic systems it is essential to have a reporting standard so that information about an algorithmic system can be shared. This reporting standard describes how information about the different phases of an algorithm's life cycle can be reported. It contains, among other things, descriptive information combined with information about the technical tests and assessments applied.
Disclaimer
The TAD Reporting Standard is work in progress. This means that the current standard is probably suboptimal and will change significantly in future versions.
"},{"location":"projects/tad/reporting-standard/latest/#introduction","title":"Introduction","text":"Inspired by Model Cards for Model Reporting and Papers with Code Model Index this standard almost1 2 3 4 extends the Hugging Face model card metadata specification to allow for:
metrics_field
from the Hugging Face metadata specification.measurements
.assessments
.Following Hugging Face, this proposed standard will be written in YAML.
This standard does not contain all fields present in the Hugging Face metadata specification. The fields that are optional in the Hugging Face specification and are specific to the Hugging Face interface are omitted.
Another difference is that we divide our implementation into three separate parts.
system_card
, containing information about a group of ML-models which accomplish a specific task.model_card
, containing information about a specific data science model.assessment_card
, containing information about a regulatory assessment.Include statements
These model_card
s and assessment_card
s can be included verbatim into a system_card
, or referenced with an !include
statement, allowing for minimal cards to be compact in a single file. Extensive cards can be split up for readability and maintainability. Our standard allows for the !include
to be used anywhere.
The standard will be written in YAML. Example YAML files are given in the next section. The standard defines three cards: a system_card
, a model_card
and an assessment_card
. A system_card
contains information about an algorithmic system. It can have multiple models and each of these models should have a model_card
. Regulatory assessments can be processed in an assessment_card
. Note that model_card
's and assessment_card
's can be included directly into the system_card
or can be included as separate YAML files with help of a YAML-include mechanism. For clarity the latter is preferred and is also used in the examples in the next section.
system_card
","text":"A system_card
contains the following information.
schema_version
(REQUIRED, string). Version of the schema used, for example \"0.1a2\".provenance
(OPTIONAL). In case this System Card is generated from another source file, this field can capture the historical context of the contents of this System Card.
git_commit_hash
(OPTIONAL, string). Git commit hash of the commit which contains the transformation file used to create this card.timestamp
(OPTIONAL, string). A timestamp of the date, time and timezone of generation of this System Card. Timestamp should be given, preferably in UTC (represented as Z
), in ISO 8601 format, i.e. 2024-04-16T16:48:14Z
.uri
(OPTIONAL, string). URI to the tool that was used to perform the transformations.author
(OPTIONAL, string). Name of person that initiated the transformations.name
(OPTIONAL, string). Name used to describe the system.
upl
(OPTIONAL, string). If this algorithm is part of a product offered by the Dutch Government, it should contain a URI from the Uniform Product List.owners
(OPTIONAL, list). There can be multiple owners. For each owner the following fields are present.
oin
(OPTIONAL, string). If applicable the Organisatie-identificatienummer (OIN).organization
(OPTIONAL, string). Name of the organization that owns the model. If ion
is NOT provided this field is REQUIRED.name
(OPTIONAL, string). Name of a contact person within the organization.email
(OPTIONAL, string). Email address of the contact person or organization.role
(OPTIONAL, string). Role of the contact person. This field should only be set when the name
field is set.description
(OPTIONAL, string). A short description of the system.
labels
(OPTIONAL, list). This fields allows to store meta information about a system. There can be multiple labels. For each label the following fields are present.
name
(OPTIONAL, string). Name of the label.value
(OPTIONAL, string). Value of the label.status
(OPTIONAL, string). The status of the system. For example the status can be \"production\".
publication_category
(OPTIONAL, enum[string]). The publication category of the algorithm should be chosen from [\"high_risk\", other\"]
.begin_date
(OPTIONAL, string). The first date the system was used. Date should be given in ISO 8601 format, i.e. YYYY-MM-DD
.end_date
(OPTIONAL, string). The last date the system was used. Date should be given in ISO 8601 format, i.e. YYYY-MM-DD
.goal_and_impact
(OPTIONAL, string). The purpose of the system and the impact it has on citizens and companies.considerations
(OPTIONAL, string). The pro's and con's of using the system.risk_management
(OPTIONAL, string). Description of the risks associated with the system.human_intervention
(OPTIONAL, string). A description to want extend there is human involvement in the system.legal_base
(OPTIONAL, list). If there exists a legal base for the process the system is embedded in, this field can be filled in with the relevant laws. There can be multiple legal bases. For each legal base the following fields are present.
name
(OPTIONAL, string). Name of the law.link
(OPTIONAL, string). URI pointing towards the contents of the law.used_data
(OPTIONAL, string). An overview of the data that is used in the system.
technical_design
(OPTIONAL, string). Description on how the system works.external_providers
(OPTIONAL, list). If relevant, these fields allow to store information on external providers. There can be multiple external providers.
name
(OPTIONAL, string). Name of the external provider.version
(OPTIONAL, string). Version of the external provider reflecting its relation to previous versions.references
(OPTIONAL, list[string]). Additional reference URI's that point information about the system and are relevant.
interaction_details
(OPTIONAL, list[string]). Explain how the AI system interacts with hardware or software, including other AI systems, or how the AI system can be used to interact with hardware or software.version_requirements
(OPTIONAL, list[string]). Describe the versions of the relevant software or firmware, and any requirements related to version updates.deployment_variants
(OPTIONAL, list[string]). Description of all the forms in which the AI system is placed on the market or put into service, such as software packages embedded into hardware, downloads, or APIs.hardware_requirements
(OPTIONAL, list[string]). Provide a description of the hardware on which the AI system must be run.product_markings
(OPTIONAL, list[string]). If the AI system is a component of products, photos, or illustrations, describe the external features, markings, and internal layout of those products.user_interface
(OPTIONAL, list). Provide information on the user interface provided to the user responsible for its operation.
description
(OPTIONAL, string). A description of the provided user interface.link
(OPTIONAL, string). A link to the user interface can be included.snapshot
(OPTIONAL, string). A snapshot/screenshot of the user interface can be included with the use of a hyperlink.models
(OPTIONAL, list[ModelCard]). A list of model cards (as defined below) or !include
s of a YAML file containing a model card. This model card can for example be a model card described in the next section or a model card from Hugging Face. There can be multiple model cards, meaning multiple models are used.
assessments
(OPTIONAL, list[AssessmentCard]). A list of assessment cards (as defined below) or !include
s of a YAML file containing a assessment card. This assessment card is an assessment card described in the next section. There can be multiple assessment cards, meaning multiple assessment were performed.
model_card
","text":"A model_card
contains the following information.
provenance
(OPTIONAL). In case this Model Card is generated from another source file, this field can capture the historical context of the contents of this Model Card.
git_commit_hash
(OPTIONAL, string). Git commit hash of the commit which contains the transformation file used to create this card.timestamp
(OPTIONAL, string). A timestamp of the date, time and timezone of generation of this System Card. Timestamp should be given, preferably in UTC (represented as Z
), in ISO 8601 format, i.e. 2024-04-16T16:48:14Z
.uri
(OPTIONAL, string). URI to the tool that was used to perform the transformations.author
(OPTIONAL, string). Name of person that initiated the transformations.language
(OPTIONAL, list[string]). If relevant, the natural languages the model supports in ISO 639. There can be multiple languages.
license
(REQUIRED).
license_name
(REQUIRED, string). Any license from the open source license list1. If the license is NOT present in the license list this field must be set to 'other' and the following two fields will be REQUIRED.license_link
(OPTIONAL, string). A link to a file of that name inside the repo, or a URL to a remote file containing the license contents.tags
(OPTIONAL, list[string]). Tags with keywords to describe the project. There can be multiple tags.
owners
(OPTIONAL, list). There can be multiple owners. For each owner the following fields are present.
oin
(OPTIONAL, string). If applicable the Organisatie-identificatienummer (OIN).organization
(OPTIONAL, string). Name of the organization that owns the model. If ion
is NOT provided this field is REQUIRED.name
(OPTIONAL, string). Name of a contact person within the organization.email
(OPTIONAL, string). Email address of the contact person or organization.role
(OPTIONAL, string). Role of the contact person. This field should only be set when the name
field is set.model_index
(REQUIRED, list). There can be multiple models. For each model the following fields are present.
name
(REQUIRED, string). The name of the model.model
(REQUIRED, string). A URI pointing to a repository containing the model file.artifacts
(OPTIONAL, list). A list of artifacts
uri
(OPTIONAL, string) URI refers to a relevant model artifactcontent-type
(OPTIONAL, string) Optional type, follow the Content-Type. Recognized values are \"application/onnx\"\", to refer to an ONNX representation of the model.md5-checksum
(OPTIONAL, string) Optional checksum for the content of the file.parameters
(OPTIONAL, list). There can be multiple parameters. For each parameter the following fields are present.
name
(REQUIRED, string). The name of the parameter, for example \"epochs\".dtype
(OPTIONAL, string). The datatype of the parameter, for example \"int\".value
(OPTIONAL, string). The value of the parameter, for example 100.labels
(OPTIONAL, list). This field allows to store meta information about a parameter. There can be multiple labels. For each label the following fields are present.
name
(OPTIONAL, string). The name of the label.dtype
(OPTIONAL, string). The datatype of the feature. If name
is set, this field is REQUIRED.value
(OPTIONAL, string). The value of the feature. If name
is set, this field is REQUIRED.results
(OPTIONAL, list). There can be multiple results. For each result the following fields are present.
task
(OPTIONAL, list).
task_type
(REQUIRED, string). The task of the model, for example \"object-classification\".task_name
(OPTIONAL, string). A pretty name for the model tasks, for example \"Object Classification\".datasets
(OPTIONAL, list). There can be multiple datasets 2. For each dataset the following fields are present.
type
(REQUIRED, string). The type of the dataset, can be a dataset id from Hugging Face datasets or any other link to a repository containing the dataset3, for example \"common_voice\".name
(REQUIRED, string). Name pretty name for the dataset, for example \"Common Voice (French)\".split
(OPTIONAL, string). The split of the dataset, for example \"train\".features
(OPTIONAL, list[string]). List of feature names.revision
(OPTIONAL, string). Version of the dataset, for example \"5503434ddd753f426f4b38109466949a1217c2bb\".metrics
(OPTIONAL, list). There can be multiple metrics. For each metric the following fields are present.
type
(REQUIRED, string). A metric-id from Hugging Face metrics4, for example accuracy.name
(REQUIRED, string). A descriptive name of the metric. For example \"false positive rate\" is not a descriptive name, but \"training false positive rate w.r.t class x\" is.dtype
(REQUIRED, string). The data type of the metric, for example float
.value
(REQUIRED, string). The value of the metric.labels
(OPTIONAL, list). This field allows to store meta information about a metric. For example, metrics can be computed for example on subgroups of specific features. For example, one can compute the accuracy for examples where the feature \"gender\" is set to \"male\". There can be multiple subgroups, which means that the metric is computed on the intersection of those subgroups. There can be multiple labels. For each label the following fields are present.
name
(OPTIONAL, string). The name of the feature. For example: \"gender\".type
(OPTIONAL, string). The type of the label. Can for example be set to \"feature\" or \"output_class\". If name
is set, this field is REQUIRED.dtype
(OPTIONAL, string). The datatype of the feature, for example float
. If name
is set, this field is REQUIRED.value
(OPTIONAL, string). The value of the feature. If name
is set, this field is REQUIRED. For example: \"male\".measurements
.
bar_plots
(OPTIONAL, list). The purpose of this field is to capture bar plot like measurements, for example SHAP values. There can be multiple bar plots. For each bar plot the following fields are present.
type
(REQUIRED, string). The type of bar plot, for example \"SHAP\".name
(OPTIONAL, string). A pretty name for the plot, for example \"Mean Absolute SHAP Values\".results
(REQUIRED, list). The contents of the bar plot. A result represents a bar. There can be multiple results. For each result the following fields are present.
name
(REQUIRED, string). The name of bar.value
(REQUIRED, float). The value of the corresponding bar.graph_plots
(OPTIONAL, list). The purpose of this field is to capture graph plot like measurements, such as partial dependence plots. There can be multiple graph plots. For each graph plot the following fields are present.
type
(REQUIRED, string). The type of the graph plot, for example \"partial_dependence\".name
(OPTIONAL, string). A pretty name of the graph, for example \"Partial Dependence Plot\".results
(REQUIRED, list). Results contains the graph plot data. Each graph can depend on a specific output class and feature. There can be multiple results. For each result the following fields are present.
class
(OPTIONAL, string/int/float/bool). The output class name that the graph corresponds to. This field is not always present.feature
(REQUIRED, string). The feature the graph corresponds to. This is required, since all relevant graphs are dependent on features.data
(REQUIRED, list)
x_value
(REQUIRED, float). The $x$-value of the graph.y_value
(REQUIRED, float). The $y$-value of the graph.assessment_card
","text":"An assessment_card
contains the following information.
provenance
(OPTIONAL). In case this Assessment Card is generated from another source file, this field can capture the historical context of the contents of this Assessment Card.
git_commit_hash
(OPTIONAL, string). Git commit hash of the commit which contains the transformation file used to create this card.timestamp
(OPTIONAL, string). A timestamp of the date, time and timezone of generation of this System Card. Timestamp should be given, preferably in UTC (represented as Z
), in ISO 8601 format, i.e. 2024-04-16T16:48:14Z
.uri
(OPTIONAL, string). URI to the tool that was used to perform the transformations.author
(OPTIONAL, string). Name of person that initiated the transformations.name
(REQUIRED, string). The name of the assessment.
urn
(OPTIONAL, string). A Uniform Resource Name (URN) of the instrument in the instrument register.date
(REQUIRED, string). The date at which the assessment is completed. Date should be given in ISO 8601 format, i.e. YYYY-MM-DD
.contents
(REQUIRED, list). There can be multiple items in contents. For each item the following fields are present:
question
(REQUIRED, string). A question.urn
(OPTIONAL, string). A Uniform Resource Name (URN) of the corresponding task in the instrument register.answer
(REQUIRED, string). An answer.remarks
(OPTIONAL, string). A field to put relevant discussion remarks in.authors
(OPTIONAL, list). There can be multiple names. For each name the following field is present.
name
(OPTIONAL, string). The name of the author of the question.timestamp
(OPTIONAL, string). A timestamp of the date, time and timezone of the answer. Timestamp should be given, preferably in UTC (represented as Z
), in ISO 8601 format, i.e. 2024-04-16T16:48:14Z
.
version: {system_card_version}\nprovenance:\n git_commit_hash: {git_commit_hash}\n timestamp: {modification_timestamp}\n uri: {modification_uri}\n author: {modification_author}\nname: {system_name}\nupl: {upl_uri}\nowners:\n - oin: {oin}\n organization: {organization_name}\n name: {owner_name}\n email: {owner_email}\n role: {owner_role}\ndescription: {system_description}\nlabels:\n - name: {label_name}\n value: {label_value}\nstatus: {system_status}\npublication_category: {system_publication_cat}\nbegin_date: {system_begin_date}\nend_date: {system_end_date}\ngoal_and_impact: {system_goal_and_impact}\nconsiderations: {system_considerations}\nrisk_management: {system_risk_management}\nhuman_intervention: {system_human_intervention}\nlegal_base:\n - name: {law_name}\n link: {law_uri}\nused_data: {system_used_data}\ntechnical_design: {technical_design}\nexternal_providers:\n - name: {name_external_provider}\n version: {version_external_provider}\nreferences:\n - {reference_uri}\ninteraction_details:\n - {system_interaction_details}\nversion_requirements:\n - {system_version_requirements}\ndeployment_variants:\n - {system_deployment_variants}\nhardware_requirements:\n - {system_hardware_requirements}\nproduct_markings:\n - {system_product_markings}\nuser_interface:\n - description: {system_user_interface}\n link: {system_user_interface_uri}\n snapshot: {system_user_interface_snapshot_uri}\n\nmodels:\n - !include {model_card_uri}\n\nassessments:\n - !include {assessment_card_uri}\n
"},{"location":"projects/tad/reporting-standard/latest/#model-card","title":"Model Card","text":"provenance:\n git_commit_hash: {git_commit_hash}\n timestamp: {modification_timestamp}\n uri: {modification_uri}\n author: {modification_author}\nlanguage:\n - {lang_0}\nlicense:\n license_name: {license_name}\n license_link: {license_uri}\ntags:\n - {tag_0}\nowners:\n - oin: {oin}\n organization: {organization_name}\n name: {owner_name}\n email: {owner_email}\n role: {owner_role}\n\nmodel-index:\n - name: {model_id}\n model: {model_uri}\n artifacts:\n - uri: {model_artifact_uri}\n - content-type: {model_artifact_type}\n - md5-checksum: {md5_checksum}\n parameters:\n - name: {parameter_name}\n dtype: {parameter_dtype}\n value: {parameter_value}\n labels:\n - name: {label_name}\n dtype: {label_type}\n value: {label_value}\n results:\n - task:\n - type: {task_type}\n name: {task_name}\n datasets:\n - type: {dataset_type}\n name: {dataset_name}\n split: {split}\n features:\n - {feature_name}\n revision: {dataset_version}\n metrics:\n - type: {metric_type}\n name: {metric_name}\n dtype: {metric_dtype}\n value: {metric_value}\n labels:\n - name: {label_name}\n type: {label_type}\n dtype: {label_type}\n value: {label_value}\n measurements:\n bar_plots:\n - type: {measurement_type}\n name: {measurement_name}\n results:\n - name: {bar_name}\n value: {bar_value}\n graph_plots:\n - type: {measurement_type}\n name: {measurement_name}\n results:\n - class: {class_name}\n feature: {feature_name}\n data:\n - x_value: {x_value}\n y_value: {y_value}\n
"},{"location":"projects/tad/reporting-standard/latest/#assessment-card","title":"Assessment Card","text":"provenance:\n git_commit_hash: {git_commit_hash}\n timestamp: {modification_timestamp}\n uri: {modification_uri}\n author: {modification_author}\nname: {assessment_name}\nurn: {urn}\ndate: {assessment_date}\ncontents:\n - question: {question_text}\n urn: {urn}\n answer: {answer_text}\n remarks: {remarks_text}\n authors:\n - name: {author_name}\n timestamp: {timestamp}\n
"},{"location":"projects/tad/reporting-standard/latest/#schema","title":"Schema","text":"JSON schema will be added when we publish the first beta version.
"},{"location":"projects/tad/reporting-standard/latest/#changelog","title":"Changelog","text":"Deviation from the Hugging Face specification is in the License field. Hugging Face only accepts dataset id's from Hugging Face license list while we accept any license from Open Source License List.\u00a0\u21a9\u21a9
Deviation from the Hugging Face specification is in the model_index:results:dataset
field. Hugging Face only accepts one dataset, while we accept a list of datasets.\u00a0\u21a9\u21a9
Deviation from the Hugging Face specification is in the Dataset Type field. Hugging Face only accepts dataset id's from Hugging Face datasets while we also allow for any url pointing to the dataset.\u00a0\u21a9\u21a9
For this extension to work relevant metrics (such as for example false positive rate) have to be added to the Hugging Face metrics, possibly this can be done in our organizational namespace.\u00a0\u21a9\u21a9
The purpose of a code review is to ensure the quality, readability, and that all requirements from the ticket have been met for a change before it gets merged into the main codebase. Additionally, code reviews are a communication tool, they allow team members to stay aware of changes being made.
Code reviews involve having a team member examine the changes made by another team member and give feedback or ask questions if needed.
"},{"location":"way-of-working/code-reviews/#creating-a-pull-request","title":"Creating a Pull Request","text":"We use GitHub pull requests (PR) for code reviews. You can make a draft PR if your work is still in progress. When you are done you can remove the draft status. A team member may start reviewing when the PR does not have a draft status.
For team ADRs at least 3 accepting reviews are required, or all team members should accept if it can be expected that the ADR is controversial.
A team ADR is an ADR made in the ai-validation repository.
All other PRs only need at least 1 reviewer to get accepted, but can have more reviewers if desired (by either reviewer or author).
"},{"location":"way-of-working/code-reviews/#review-process","title":"Review process","text":"By default the codeowner, indicated in the CODEOWNER file, will be requested to review. For us this is the GitHub team AI-validation. If the PR creator wants a specific team member to review, the PR creator should add the team member specifically in the reviewers section of the PR. A message in Mattermost will be posted for PRs. Then with the reaction of an emoji a reviewer will indicate they are looking at the PR.
If the reviewer has suggestions or comments the PR creator can fix those or add comments to the suggestions. When the creator of the PR thinks he is done with the feedback he must re-request a review from the person that did the review. The reviewer must then look at the changes and approve or add more comments. This process continues until the reviewer agrees that all is correct and approves the PR.
Once the review is approved the reviewer checks if the branch is in sync with the main branch before merging. If not, the reviewer rebases the branch. Once the branch is in sync with main the reviewer merges the PR and checks if the deployment is successful. If the deployment is not successful the reviewer fixes it. If the PR needs more than one review, the last accepting reviewer merges the PR.
"},{"location":"way-of-working/contributing/","title":"Contributing to AI Validation","text":"First off, thanks for taking the time to contribute! \u2764\ufe0f
All types of contributions are encouraged and valued. See the Table of Contents for different ways to help and details about how this project handles them. Please make sure to read the relevant section before making your contribution. It will make it a lot easier for us maintainers and smooth out the experience for all involved. The community looks forward to your contributions. \ud83c\udf89
"},{"location":"way-of-working/contributing/#table-of-contents","title":"Table of Contents","text":"This project and everyone participating in it is governed by the Code of Conduct. By participating, you are expected to uphold this code. Please report unacceptable behavior to ai-validatie@minbzk.nl.
"},{"location":"way-of-working/contributing/#i-have-a-question","title":"I Have a Question","text":"Before you ask a question, it is best to search for existing Issues that might help you. In case you have found a suitable issue and still need clarification, you can write your question in this issue.
If you then still feel the need to ask a question and need clarification, we recommend the following:
We will then take care of the issue as soon as possible.
"},{"location":"way-of-working/contributing/#i-want-to-contribute","title":"I Want To Contribute","text":""},{"location":"way-of-working/contributing/#legal-notice","title":"Legal Notice","text":"When contributing to this project, you must agree that you have authored 100% of the content, that you have the necessary rights to the content and that the content you contribute may be provided under the project license.
"},{"location":"way-of-working/contributing/#reporting-bugs","title":"Reporting Bugs","text":""},{"location":"way-of-working/contributing/#before-submitting-a-bug-report","title":"Before Submitting a Bug Report","text":"A good bug report shouldn't leave others needing to chase you up for more information. Therefore, we ask you to investigate carefully, collect information and describe the issue in detail in your report. Please complete the following steps in advance to help us fix any potential bug as fast as possible.
You must never report security related issues, vulnerabilities or bugs including sensitive information to the issue tracker, or elsewhere in public. Instead sensitive bugs must be sent by email to ai-validatie@minbzk.nl.
We use GitHub issues to track bugs and errors. If you run into an issue with the project:
Once it's filed:
needs-repro
. Bugs with the needs-repro
tag will not be addressed until they are reproduced.needs-fix
, as well as possibly other tags (such as critical
), and the issue will be left to be implemented by someone.This section guides you through submitting an enhancement suggestion for this project, including completely new features and minor improvements. Following these guidelines will help maintainers and the community to understand your suggestion and find related suggestions.
"},{"location":"way-of-working/contributing/#before-submitting-an-enhancement","title":"Before Submitting an Enhancement","text":"Enhancement suggestions are tracked as GitHub issues.
We have commit message conventions: Commit convention
"},{"location":"way-of-working/contributing/#markdown-lint","title":"Markdown Lint","text":"We use Markdown lint to standardize Markdown: Markdown lint config.
"},{"location":"way-of-working/contributing/#pre-commit","title":"Pre-commit","text":"We use pre-commit to enabled standardization: pre-commit config.
"},{"location":"way-of-working/decision-log/","title":"Decision Log","text":"Throughout our work, small decisions about processes and approaches are often made in meetings and chats. While these aren't big enough for formal documentation like ADRs, capturing them is valuable for both current and future team members.
This log provides a reference point for those decisions.
"},{"location":"way-of-working/decision-log/#overview-of-decisions","title":"Overview of decisions","text":"We're sad to see you go! But if you do, here's what not to forget.
"},{"location":"way-of-working/off-boarding/#github","title":"GitHub","text":"For clarity and consistency, this document defines some terms used within our team where the meaning in Data Science or Computer Science differs, and terms that are for any reason good to mention.
For a full reference for Machine Learning, we recommend ML Fundamentals from Google.
"},{"location":"way-of-working/onboarding/","title":"Onboarding","text":"Make sure you have installed Mattermost, then follow these steps.
Make sure you have installed Webex, then follow these steps.
Make sure you have installed Tuple, then follow these steps.
Create or use your existing GitHub account.
Bookmark these links in your browser:
We use HashiCorp Vault secrets manager for team secrets. You can login with a GitHub Personal access token. The token needs organization read permissions (read:org
), and you should be part of our GitHub team to access the vault.
We are assuming your dev machine is a Mac. This guide is rather opinionated, feel free to have your own opinion, and feel free to contribute! Contributing can be done by clicking \"edit\" top right and by making a pull request on this repository.
"},{"location":"way-of-working/onboarding/dev-machine/#things-that-should-have-been-default-on-mac","title":"Things that should have been default on Mac","text":"Homebrew as the missing Package Manager
/bin/bash -c \"$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)\"\n
Rectangle
brew install --cask rectangle\n
WebEx for video conferencing
brew install --cask webex\n
Mattermost for team communication
brew install --cask mattermost\n
Iterm2
brew install --cask iterm2\n
Oh My Zsh
/bin/bash -c \"$(curl -fsSL https://raw.githubusercontent.com/ohmyzsh/ohmyzsh/master/tools/install.sh)\"\n
Autosuggestions for zsh
git clone https://github.com/zsh-users/zsh-autosuggestions ~/.oh-my-zsh/custom/plugins/zsh-autosuggestions\n
Fish shell like syntax highlighting for Zsh
brew install zsh-syntax-highlighting\n
Add plugins to your shell in ~/.zshrc
plugins = (\n # other plugins...\n zsh-autosuggestions\n kubectl\n docker\n docker-compose\n pyenv\n z\n)\n
Touch ID in Terminal
Sourcetree
brew install --cask sourcetree\n
Pyenv
brew install pyenv\n
pyenv virtualenv
brew install pyenv-virtualenv\n
pre-commit
brew install pre-commit\n
Xcode Command Line Tools
xcode-select --install\n
TabbyML Opensource, self-hosted AI coding assistant
We can not just use hosted versions of coding assistants because of privacy and copyright issues. We can however use self-hosted coding assistants provided they are trained on data with permissive licenses.
StarCoder (1-7B) models are all trained on version 1.2 of The Stack dataset. It boils down to all open GitHub code with permissive licenses (193 licenses in total). Minus opt-out requests.
Code Lama and Deepseek models are not clear enough about their data licenses.
brew install tabbyml/tabby/tabby\ntabby serve --device metal --model TabbyML/StarCoder-3B\n
Then configure your IDE by installing a plugin.
Sign commits using SSH
Here we are documenting the processes and work of the AI Validation Team at the Ministry of the Interior and Kingdom Relations in The Netherlands.
We are a team of engineers, UX designers & researchers, and product experts at a policy department.
We work on the following projects within the Transparency of Algorithmic Decision making scope:
graph TB\n ak[<a href='https://minbzk.github.io/Algoritmekader/'>Algoritmekader</a>] <--> tmt\n\n subgraph tmt[Algorithm Management Toolkit]\n st[<a href='/ai-validation/projects/tad/reporting-standard/'>Reporting Standard</a>] --> tad[<a href='https://github.com/MinBZK/tad/'>Algorithm Management Platform</a>]\n tad <--> llm[<a href='/ai-validation/projects/llm-benchmarks/'>LLM Benchmark Tooling</a>]\n end\n\n tmt --> ar[<a href='https://algoritmes.overheid.nl/en/'>The Algorithm Register of the Dutch government</a>]\n tmt --> or[Other registries]
"},{"location":"#contribute","title":"Contribute","text":"Read our guide on how to contribute.
"},{"location":"#contact","title":"Contact","text":"Our contact details are here.
"},{"location":"about/contact/","title":"Contact","text":"Contact us at ai-validatie@minbzk.nl.
"},{"location":"about/team/","title":"Our Team","text":""},{"location":"about/team/#robbert-bos","title":"Robbert Bos","text":"Product Owner
Robbert has been on a mission for over 15 years to enhance the transparency and collaboration within AI projects. Before joining this team, he founded several data science and tech companies (partly) dedicated to this cause. Robbert is passionate about solving complex problems where he connects business needs with technology and involves others in how these solutions can improve their work.
robbertbos
Robbert Bos
"},{"location":"about/team/#lucas-haitsma","title":"Lucas Haitsma","text":"Researcher in Residence
Lucas is PhD candidate conducting research into the regulation and governance of algorithmic discrimination by supervision and enforcement organizations. Lucas is our Researcher in Residence.
Lucas Haitsma
rug.nl
"},{"location":"about/team/#berry-den-hartog","title":"Berry den Hartog","text":"Engineer
Berry is a software engineer passionate about problem-solving and system optimization, with expertise in Go, Python, and C++. Specialized in architecting high-volume data processing systems and implementing Lean-Agile and DevOps practices. Experienced in managing end-to-end processes from hardware provisioning to software deployment and release.
berrydenhartog
Berry den Hartog
"},{"location":"about/team/#anne-schuth","title":"Anne Schuth","text":"Engineering Manager
Anne used to be a Machine Learning Engineering Manager at Spotify and previously held roles at DPG Media, Blendle, and Google AI. He holds a PhD from the University of Amsterdam.
anneschuth
Anne Schuth
anneschuth.nl
"},{"location":"about/team/#christopher-spelt","title":"Christopher Spelt","text":"Engineer
After graduating in pure mathematics, Christopher transitioned into machine learning. He is passionate about solving complex problems, especially those that have a societal impact. My expertise lies in math, machine learning theory and I'm skilled in Python.
ChristopherSpelt
Christopher Spelt
"},{"location":"about/team/#robbert-uittenbroek","title":"Robbert Uittenbroek","text":"Engineer
Robbert is a highly enthusiastic full-stack engineer with a Bachelor's degree in Computer Science from the Hanze University of Applied Sciences in Groningen. He is passionate about building secure, compliant, and ethical solutions, and thrives in collaborative environments. Robbert is eager to leverage his skills and knowledge to help shape and propel the future of IT within the government.
uittenbroekrobbert
Robbert Uittenbroek
"},{"location":"about/team/#laurens-weijs","title":"Laurens Weijs","text":"Engineer
Laurens is a passionate guy with a love for innovation and doing things differently. With a background in Econometrics and Computer Science he loves to tackle the IT challenges of the Government by helping other people through extensive knowledge sharing on stage, building neural networks himself, or building a strong community.
laurensWe
Laurens Weijs
"},{"location":"about/team/#guusje-juijn","title":"Guusje Juijn","text":"Trainee
Guusje is currently enrolled in a two-year traineeship at the Dutch Government. After finishing her first assignment at a policy department, she is excited to bring her knowledge about AI policy to a technical team. Guusje has a background in Artificial Intelligence, is experienced in Python and machine learning and has a strong interest in AI ethics.
GuusjeJuijn
Guusje Juijn
"},{"location":"about/team/#ruben-rouwhof","title":"Ruben Rouwhof","text":"UX/UI Designer
Ruben is a dedicated UX/UI Designer focused on crafting user-centric digital experiences. He is involved in projects from start to finish, covering user research, design, and technical implementation.
rubenrouwhof
Ruben Rouwhof
rubenrouwhof.nl
"},{"location":"about/team/#ravi-meijer","title":"Ravi Meijer","text":"Product Researcher
Ravi is an accomplished data scientist with expertise in machine learning, responsible AI, and the data science lifecycle. Her background in AI fuels her passion for solving complex problems and driving innovation for positive social impact.
ravimeijerrig
Ravi Meijer
"},{"location":"about/team/#our-alumni","title":"Our Alumni","text":""},{"location":"about/team/#willy-tadema","title":"Willy Tadema","text":"AI Ethics Lead
Willy specializes in AI governance, AI risk management, AI assurance and ethics-by-design. She is an advocate of AI standards and a member of several ethics committees.
FrieseWoudloper
Willy Tadema
"},{"location":"adrs/0001-adrs/","title":"ADR-0001 ADRs","text":""},{"location":"adrs/0001-adrs/#context","title":"Context","text":"In modern software development practices, the use of Architecture Decision Records (ADRs) has become increasingly common. ADRs are documents that capture important architectural decisions made during the development process. These decisions play a crucial role in guiding the development team and ensuring consistency and coherence in the architecture of the software system.
"},{"location":"adrs/0001-adrs/#assumptions","title":"Assumptions","text":"We will utilize ADRs in our team to document and communicate architectural decisions effectively. Furthermore, we will publish these ADRs publicly to promote transparency and facilitate collaboration.
"},{"location":"adrs/0001-adrs/#template","title":"Template","text":"Use the template below to add an ADR:
# ADR-XXXX Title\n\n## Context\n\nWhat is the issue that we're seeing that is motivating this decision or change?\n\n## Assumptions\n\nAnything that could cause problems if untrue now or later. (optional)\n\n## Decision\n\nWhat is the change that we're proposing and/or doing?\n\n## Risks\n\nAnything that could cause malfunction, delay, or other negative impacts. (optional)\n\n## Consequences\n\nWhat becomes easier or more difficult to do because of this change?\n\n## More Information\n\nProvide additional evidence/confidence for the decision outcome\nLinks to other decisions and resources might here appear as well. (optional)\n
"},{"location":"adrs/0002-code-platform/","title":"ADR-0002 Code Platform","text":""},{"location":"adrs/0002-code-platform/#context","title":"Context","text":"In the landscape of software development, the choice of coding platform significantly impacts developer productivity, collaboration, and code quality. it's crucial to evaluate and select a coding platform that aligns with our development needs and fosters efficient workflows.
"},{"location":"adrs/0002-code-platform/#assumptions","title":"Assumptions","text":"The following assumptions are made:
After careful consideration and evaluation of various options like GitHub, GitLab and BitBucket, we propose adopting GitHub as our primary coding platform. The decision is based on the following factors:
Costs: There are currently no costs associate in using GitHub for our use cases.
Features and Functionality: GitHub offers a comprehensive set of features essential for modern software development and collaboration with external teams, including version control, code review, issue tracking, continuous integration, and deployment automation.
Security: GitHub offers a complete set of security features essential to secure development like dependency management and security scanning.
Community and Ecosystem: GitHub boasts a vibrant community and ecosystem, facilitating knowledge sharing, collaboration, and access to third-party tools, and services that can enhance our development workflows. Within our organization we have easy access to the team managing the GitHub organization.
Usability and User Experience: A user-friendly interface and intuitive workflows are essential for maximizing developer productivity and minimizing onboarding time. GitHub offers a streamlined user experience and customizable workflows that align with our team's preferences and practices.
"},{"location":"adrs/0002-code-platform/#risks","title":"Risks","text":"Currently the organization of MinBZK on GitHub does not have a lot of people
indicating that our team is an early adapter of the platform within the organization. This might impact our features due to cost constrains.
If we choose another tool in the future we need to migrate our codebase, and potentially need to rewrite some specific GitHub features that cannot be used in another tool.
"},{"location":"adrs/0002-code-platform/#more-information","title":"More Information","text":"Alternatives considered:
Our development team wants to implement a CI/CD solution to streamline the build, testing, and deployment workflows of our software products. Currently, our codebase resides on GitHub, and we leverage Kubernetes as our chosen orchestration platform, managed by the DigiLab platform team.
"},{"location":"adrs/0003-ci-cd/#decision","title":"Decision","text":"We will use the following tools for CI/CD pipeline:
GitHub Actions aligns with our existing infrastructure, ensuring seamless integration with our codebase and minimizing operational overhead. GitHub Actions' specific syntax for CI results in vendor lock-in, necessitating significant effort to migrate to an alternative CI system in the future.
Flux, being a GitOps operator for Kubernetes, offers a declarative approach to managing deployments, enhancing reliability and repeatability within our Kubernetes ecosystem.
"},{"location":"adrs/0004-software-hosting-platform/","title":"ADR-0004 Software hosting platform","text":""},{"location":"adrs/0004-software-hosting-platform/#context","title":"Context","text":"Our team recognizes the necessity of a platform to run our software, as our local machines lack the capacity to handle certain workloads effectively. We have evaluated several options available to us:
We operate under the following assumptions:
We will use Digilab Kubernetes for our workloads.
"},{"location":"adrs/0004-software-hosting-platform/#consequences","title":"Consequences","text":"By choosing Digilab Kubernetes, we gain access to a namespace within their managed Kubernetes cluster. However, it's important to note that Digilab does not provide any guarantees regarding the availability of the cluster. Should our software require higher availability assurances, we may need to explore alternative solutions.
"},{"location":"adrs/0005-python-tooling/","title":"ADR-0005 Python coding standard and tools","text":""},{"location":"adrs/0005-python-tooling/#context","title":"Context","text":"In modern software development, maintaining code quality is crucial for readability, maintainability, and collaboration. Python, being a dynamically typed language, requires robust tooling to ensure code consistency and type safety. Manual enforcement of coding standards is time-consuming and error-prone. Hence, adopting automated tooling to streamline this process is imperative.
"},{"location":"adrs/0005-python-tooling/#decision","title":"Decision","text":"We will use these standards and tools for our own projects:
Working with external projects these coding standards will not always be possible. but we will try to integrate them as much as possible.
"},{"location":"adrs/0005-python-tooling/#consequences","title":"Consequences","text":"Improved Code Quality: Adoption of these tools will lead to improved code quality, consistency, and maintainability across the project.
Enhanced Developer Productivity: Automated code formatting and static type checking will reduce manual effort and free developers to focus more on coding logic rather than formatting and type-related issues.
Reduced Bug Incidence: Static typing and linting will catch potential bugs and issues early in the development process, reducing the likelihood of runtime errors and debugging efforts.
Standardized Development Workflow: By integrating pre-commit hooks, the development workflow will be standardized, ensuring that all developers follow the same code quality standards.
"},{"location":"adrs/0006-agile-tooling/","title":"ADR-0006 Agile tooling","text":""},{"location":"adrs/0006-agile-tooling/#context","title":"Context","text":"Our development team wants to enhance transparency and productivity in our software development processes. We are using GitHub for version control and collaboration. However, to further streamline our process, there is a need to incorporate tooling for managing the effort of our team.
"},{"location":"adrs/0006-agile-tooling/#decision","title":"Decision","text":"We will use GitHub Projects as our agile process tool
"},{"location":"adrs/0006-agile-tooling/#consequences","title":"Consequences","text":"GitHub Projects seamlessly integrates with our existing GitHub repositories, allowing us to manage our Agile processes. within the same ecosystem where our code resides. This integration eliminates the need for additional third-party tools, simplifying our workflow.
"},{"location":"adrs/0007-commit-convention/","title":"ADR-0007 Commit convention","text":""},{"location":"adrs/0007-commit-convention/#context","title":"Context","text":"In software development, maintaining clear and consistent commit message conventions is crucial for effective collaboration, code review, and project management. Commit messages serve as a form of documentation, helping developers understand the changes introduced by each commit without having to analyze the code diff extensively.
"},{"location":"adrs/0007-commit-convention/#decision","title":"Decision","text":"A commit message must follow the following rules:
\\<ref>-\\<ticketnumber>: subject line
An example of a commit message:
Fix foo to enable bar
or
AB-1234: Fix foo to enable bar
or
Fix foo to enable bar
This fixes the broken behavior of component abc caused by problem xyz.
If we contribute to projects not started by us we try to follow the above standard unless a specific convention is obvious or required by the project.
"},{"location":"adrs/0007-commit-convention/#consequences","title":"Consequences","text":"In some repositories Conventional Commits are used. This ADR does not follow conventional commits.
"},{"location":"adrs/0008-architectural-diagram-tooling/","title":"ADR-0008 Architectural Diagram Tooling","text":""},{"location":"adrs/0008-architectural-diagram-tooling/#context","title":"Context","text":"To communicate our designs in a graphical manner, it is of importance to draw architectural diagrams. For this we use tooling, that supports us in our work. We need to have something that is written so that it can be processed by both people and machine, and we want to have version control on our diagrams.
"},{"location":"adrs/0008-architectural-diagram-tooling/#decision","title":"Decision","text":"We will write our architectural diagrams in Markdown-like (.mmmd) in the Mermaid Syntax to edit these diagrams one can use the various plugins. For each project where it is needed, we will add the diagrams in the repository of the subject. The level of detail we will provide in the diagrams is according to the C4-model metamodel on architecture diagramming.
"},{"location":"adrs/0008-architectural-diagram-tooling/#consequences","title":"Consequences","text":"Standardized Workflow: By maintaining architecture as code, it will be standardized in our workflow.
Version control on diagrams: By using version control, we will be able to collaborate easier on the diagrams, and we will be able to see the history of them.
Diagrams are in .md format: By storing our diagrams next to our code, it will be where you need it the most.
"},{"location":"adrs/0010-container-registry/","title":"ADR-0010 Container Registry","text":""},{"location":"adrs/0010-container-registry/#context","title":"Context","text":"Containers allow us to package and run applications in a standardized and portable way. To be able to (re)use and share images, they need to be stored in a registry that is accessible by others.
There are many container registries. During research the following registries have been noted:
Docker Hub, GitHub Container Registry, Amazon Elastic Container Registry (ECR), Azure Container Registry (ACR), Google Artifact Registry (GAR), Red Hat Quay, GitLab Container Registry, Harbor, Sonatype Nexus Repository Manager, JFrog Artifactory.
"},{"location":"adrs/0010-container-registry/#assumptions","title":"Assumptions","text":"We will use GitHub Container Registry.
This aligns best with the previously made choices for GitHub as a code repository and CI/CD workflow.
"},{"location":"adrs/0010-container-registry/#risks","title":"Risks","text":"Traditionally, Docker Hub has been the place to publish images. Therefore, our images may be more difficult to discover.
The following assumptions are not (directly) covered by the chosen registry:
By using GitHub Container Registry we have a container registry we can use both internally as well as share with others. This has low impact, we can always move to another registry since the Open Container Initiative is standardized.
"},{"location":"adrs/0010-container-registry/#more-information","title":"More Information","text":"The following sites have been consulted:
The AI validation team works transparently. Working with public funds warrants transparency toward the public. Additionally, being transparent aligns with the team's mission of increasing the transparency of public organizations. In line with this reasoning, it is important to be open to researchers interested in the work of the AI validation team. Allowing researchers to conduct research within the team contributes to transparency and enables external perspectives and feedback to be incorporated into the team's work.
"},{"location":"adrs/0011-researcher-in-residence/#assumptions","title":"Assumptions","text":"We have decided to include a researcher in residence as a member of our team.
The researcher in residence takes the following form:
The following conditions apply to the researcher in residence.
Risks around a potential chilling effect (team members not feeling free to express themselves) are mitigated by the conditions we impose. In light of aforementioned form and conditions above, we see no further significant risks.
"},{"location":"adrs/0011-researcher-in-residence/#consequences","title":"Consequences","text":"Including a researcher in residence makes it easier for them to conduct research within both the team and the wider organization where the AI validation team operates. This benefits the quality of the research findings and the feedback provided to the team and organization.
"},{"location":"adrs/0012-dictionary-for-spelling/","title":"ADR-0012 Dictionary for spelling","text":""},{"location":"adrs/0012-dictionary-for-spelling/#context","title":"Context","text":"We use English as language in some of our external communications, like on GitHub. We noticed that among different documents certain words are spelled correctly but differently, depending on the author or dictionary used. Also there are occasional typos which can cause distraction and don't meet professional standards.
"},{"location":"adrs/0012-dictionary-for-spelling/#assumptions","title":"Assumptions","text":"Standardizing the used dictionary avoids discussion on spelling and makes documents consistent. Eliminating typos contributes to professional, credible and unambiguous documents.
Using a dictionary in a pre-commit hook will prevent commits being made with obvious spelling issues.
"},{"location":"adrs/0012-dictionary-for-spelling/#decision","title":"Decision","text":"We will use the U.S. English spelling dictionary.
"},{"location":"adrs/0012-dictionary-for-spelling/#risks","title":"Risks","text":"It may slow down committing large files.
"},{"location":"adrs/0012-dictionary-for-spelling/#consequences","title":"Consequences","text":"Documents will all use the same dictionary for spelling and will not contain typos.
"},{"location":"adrs/0013-date-time-representation/","title":"ADR-0013 Date Time Representation: ISO 8601","text":""},{"location":"adrs/0013-date-time-representation/#context","title":"Context","text":"In our software development projects, we have encountered ambiguity related to the representation of dates and times, particularly when dealing with time zones. The lack of a standardized approach has led to discussions and possibly ambiguity when interpreting timestamps within our applications.
"},{"location":"adrs/0013-date-time-representation/#assumptions","title":"Assumptions","text":"Standardizing the representation of dates and times will improve clarity and precision in our application's logic and user interfaces.
ISO 8601 format is better human-readable than other formats such as unix timestamps.
"},{"location":"adrs/0013-date-time-representation/#decision","title":"Decision","text":"We adopt ISO 8601 with timezone notation, preferably in UTC (Z
), as the standard method for representing dates and times in our software projects, replacing the usage of Unix timestamps or any other formats or timezones. We use both dashes (-
) and colons (:
).
We store date and time as: 2024-04-16T16:48:14Z
(preferably with Z
as timezone, representing UTC)
We store dates as 2024-04-16
.
Only when capturing client events we may want to choose to store the client timezone instead of UTC.
When rendering a date and time in a user interface, we may want to localize the date and time for the appropriate timezone.
"},{"location":"adrs/0013-date-time-representation/#risks","title":"Risks","text":"Increased storage space: ISO 8601 representations can be longer than other formats, leading to potential increases in storage requirements, especially when dealing with large datasets.
"},{"location":"adrs/0013-date-time-representation/#consequences","title":"Consequences","text":"A single ISO 8601 with UTC timezone provides a clear and unambiguous way to represent dates and times. Its format is easily recognizable and eliminates the need for interpretation. For example: 2024-04-15T10:00:00Z
can easily be understood without needing to parse it using a library.
We will need to regularly convert from localized time to UTC and back when capturing, storing, and rendering dates and times.
"},{"location":"adrs/0013-date-time-representation/#more-information","title":"More Information","text":"ISO 8601 is an internationally recognized standard endorsed by the International Organization for Standardization (ISO). Its adoption offers numerous benefits, including improved clarity, global accessibility, and future-proofing of systems and applications.
For further reading on ISO 8601:
In order to expand our reach and foster international collaboration in the field of AI Validation, we have decided to conduct all communication in English on public platforms such as GitHub. This decision aims to facilitate better understanding and participation from our global colleagues. However, within the Government of the Netherlands, the norm is to communicate in Dutch for internal purposes. This ADR will provide guidelines on which language to use for different types of communications.
"},{"location":"adrs/0014-written-language/#assumptions","title":"Assumptions","text":"There is no requirement to use Dutch as the primary language for all our activities while working for the Government of the Netherlands. More information can be found in the More Information section.
"},{"location":"adrs/0014-written-language/#decision","title":"Decision","text":"The following channels will utilize English:
The primary language for the following channels will be Dutch:
Dutch-only developers will have a harder time following along with the progression of our team on both the code on GitHub as our Project Management.
"},{"location":"adrs/0014-written-language/#consequences","title":"Consequences","text":"Although many attempts by previous cabinets, Dutch is not the official language in the Netherlands according to the Dutch constitution. See the following link.
According to the website of the Government of the Netherlands the Dutch language is the official recognized language. This means that in combination with the law Algemene wet bestuursrecht
on wetten.overheid.nl governing bodies and their employees need to communicate in Dutch unless stated differently elsewhere. It is stated here that communicating in another language than Dutch is permitted if the goal of communicating in another language than Dutch is sufficiently justified and if other parties are not effected disproportionately by the usage of another language.
Right now we have a few organizations (Logius, SSC-ICT, ODC-Noord, Tender process, and Digilab, etc...) offering IT infrastructure. This ADR will give an overview of what these different organizations are offering as well as make a decision for the AI Validation team on which infrastructure provider we will focus.
"},{"location":"adrs/0016-government-cloud-comparison/#descriptions-and-comparison","title":"Descriptions and comparison","text":"Please see the following picture for an overview of the providers in relation to what they can provide, currently we are heavily searching in the realm of unmanaged infrastructure, as we want this to manage ourselves.
"},{"location":"adrs/0016-government-cloud-comparison/#decision","title":"Decision","text":"For our infrastructure provider we decided to go with Digilab as the main source, as they can provide us with a Kubernetes namespace and are a reliable and convenient partner as we work closely with them.
"},{"location":"adrs/0016-government-cloud-comparison/#risks","title":"Risks","text":"Certain choices are made for us if we make use of the Kubernetes namespace of Digilab, for example that we need to make use of Flux for our CI/CD pipeline.
"},{"location":"adrs/0016-government-cloud-comparison/#extra-information","title":"Extra information","text":"Large Languages Models (LLMs) are becoming increasingly popular in assisting people in a variety of tasks. These tasks include, but are not limited to, information retrieval, assisting with coding and essay writing. In the context of the government, tasks can include for example supporting Freedom of Information Act (FOIA) requests and aiding in answering questions of citizens.
While the potential benefit of using LLMs is large, there are also significant risks. Basically an LLM is just a next token predictor, which bases its predictions on the user input (context) and on compressed information seen during training (LLM parameters); hence there is no guarantee on the quality and correctness of the output. Moreover, due to bias in the training data, LLMs can have bias in their output, despite best efforts to mitigate this. Additionally, we have human values that we expect LLMs to be aligned with. Certainly, within the context of a government, we should take utmost care not to discriminate. To assess the quality, correctness, bias and alignment with human values of an LLM one can perform benchmarks.
"},{"location":"projects/llm-benchmarks/#the-project","title":"The project","text":"The LLM Benchmarks project of the AI Validation Team aims to create a platform where LLMs can be measured across a wide range of benchmarks. We limit ourselves to LLMs and benchmarks that are related to the Dutch society. Both LLMs and the benchmarks can be configured by users of the platform. Users can run these benchmarks on LLMs on our platform. The intended goal of this project is to give government organizations, citizens and companies insight in the various LLMs and their quality, correctness, bias and alignment with human values. The project also encompasses a dashboard with uploaded LLMs and their performance on uploaded benchmarks. With this platform we aim to enhance public trust in the usage of LLMs and expose potential bias that exists within LLMs.
"},{"location":"projects/tad/","title":"TAD","text":"TAD is the acronym for Transparency of Algorithmic Decision making. TAD has the goal to make algorithmic systems more transparent; it achieves this by generating standardized reports on the algorithmic system which encompasses both technical aspects in addition to descriptive information about the system and regulatory assessments. For both the system and the model the lifecycle is important and this needs to be taken into account. The definition for an algorithm is derived from the Algoritmeregister.
One of the goals of the TAD project is providing a standardized format of reporting on a algorithmic system by developing a Reporting Standard. This Reporting Standard consists out of a System Card which contains Model Cards and Assessment Cards.
The final result of the project is producing System, Model and Assessment Cards with both performance metrics and technical measurements on fairness and bias of the model, assessments on the system where the specific algorithm resides, and descriptive information about the system.
The requirements and instruments are dictated by the Algoritmekader.
"},{"location":"projects/tad/comparison/","title":"Comparison of Reporting Standards","text":"This document assesses standards that standardize the way algorithm assessments can be captured.
"},{"location":"projects/tad/comparison/#background","title":"Background","text":"There are many algorithm assessments (e.g. IAMA, HUIDERIA, etc.), technical tests on performance (e.g. Accuracy, TP, FP, F1, etc), fairness and bias of algorithms (e.g. SHAP) and reporting formats available. The goal is to have a way of standardizing the way these different assessments and tests can be captured.
"},{"location":"projects/tad/comparison/#available-standards","title":"Available standards","text":""},{"location":"projects/tad/comparison/#model-cards","title":"Model Cards","text":"The most interesting existing capturing methods seem to be all based on Model Cards for Model Reporting, which are:
\"Short documents accompanying trained machine learning models that provide benchmarked evaluation in a variety of conditions, such as across different cultural, demographic, or phenotypic groups (e.g., race, geographic location, sex, Fitzpatrick skin type) and intersectional groups (e.g., age and race, or sex and Fitzpatrick skin type) that are relevant to the intended application domains. Model cards also disclose the context in which models are intended to be used, details of the performance evaluation procedures, and other relevant information\", proposed by Google. Note that \"The proposed set of sections\" in the Model Cards paper \"are intended to provide relevant details to consider, but are not intended to be complete or exhaustive, and may be tailored depending on the model, context, and stakeholders.\"
Many companies implement their own version of Model Cards, for example Meta System Cards and the tools mentioned in the next section.
"},{"location":"projects/tad/comparison/#automatic-model-card-generation","title":"Automatic model card generation","text":"There exist tools to (semi)-automatically generate models cards:
A landscape analysis of ML documentation tools has been performed by Hugging Face and provides a good overview of the current landscape.
Another interesting standard is the Algorithmic Transparency Recording Standard of the United Kingdom Government, which can be found here.
"},{"location":"projects/tad/comparison/#proposal","title":"Proposal","text":"We need a standard that captures algorithmic assessments and technical tests on model and datasets. The idea of model cards can serve as a guiding theoretical principle on how to implement such a standard. More specifically, we can draw inspiration from the existing model card schema's and implementations of VerifyML and Hugging Face. We note the following:
Hence in any case we need to extend one of these standards. We propose to:
In modern software development practices, the use of Architecture Decision Records (ADRs) has become increasingly common. ADRs are documents that capture important architectural decisions made during the development process. These decisions play a crucial role in guiding the development team and ensuring consistency and coherence in the architecture of the software system.
"},{"location":"projects/tad/adrs/0001-adrs/#assumptions","title":"Assumptions","text":"We will utilize ADRs in this project repository and communicate architectural decisions effectively. Furthermore, we will publish these ADRs publicly to promote transparency and facilitate collaboration.
"},{"location":"projects/tad/adrs/0001-adrs/#template","title":"Template","text":"Use the template below to add an ADR:
# TAD-XXXX Title\n\n## Context\n\nWhat is the issue that we're seeing that is motivating this decision or change?\n\n## Assumptions\n\nAnything that could cause problems if untrue now or later. (optional)\n\n## Decision\n\nWhat is the change that we're proposing and/or doing?\n\n## Risks\n\nAnything that could cause malfunction, delay, or other negative impacts. (optional)\n\n## Consequences\n\nWhat becomes easier or more difficult to do because of this change?\n\n## More Information\n\nProvide additional evidence/confidence for the decision outcome\nLinks to other decisions and resources might here appear as well. (optional)\n
"},{"location":"projects/tad/adrs/0002-tad-reporting-standard/","title":"TAD-0002 TAD Reporting Standard","text":""},{"location":"projects/tad/adrs/0002-tad-reporting-standard/#context","title":"Context","text":"The TAD Reporting Standard proposes a standardized way of capturing information of ML-models and systems.
"},{"location":"projects/tad/adrs/0002-tad-reporting-standard/#assumptions","title":"Assumptions","text":"There is no existing standard of capturing all relevant information on ML-models that also includes fairness and bias tests and regulatory assessments.
A widely used implementation for Model Cards for Model Reporting is given by the Hugging Face Model Card metadata specification, which in turn is based on Papers with Code Model Index. This implementation does not capture sufficient details about metrics and does not include measurements from technical tests on bias and fairness or regulatory assessments.
"},{"location":"projects/tad/adrs/0002-tad-reporting-standard/#decision","title":"Decision","text":"We decided to implement a custom reporting standard. Our reporting standard can be split up into three elements.
We were heavily inspired by the Hugging Face Model Card metadata specification, which we essentially extended to allow for:
The extension is not strict, meaning that there the TAD Reporting Standard is not a valid Hugging Face metadata specification. The reason for this is that some fields in the Hugging Face standard are too much intertwined with the Hugging Face ecosystem and it would not be logical for us to couple our implementation this tightly to Hugging Face.
"},{"location":"projects/tad/adrs/0002-tad-reporting-standard/#risks","title":"Risks","text":"The TAD Reporting Standard is not fully backwards compatible with the Hugging Face Model Card metadata specification. If in the future the Hugging Face Model Card metadata specification becomes a standard, we might need to revise the TAD standard.
"},{"location":"projects/tad/adrs/0002-tad-reporting-standard/#consequences","title":"Consequences","text":"The TAD Reporting Standard allows us to capture relevant information on model performance, bias and fairness and regulatory assessments in a standardized way.
"},{"location":"projects/tad/adrs/0003-tad-tool/","title":"TAD-0003 Tool for Transparency of Algorithmic Decision making","text":""},{"location":"projects/tad/adrs/0003-tad-tool/#context","title":"Context","text":"We are considering tooling for organizations to get more grip on their algorithms. Tooling for, for instance bias and fairness tests, and assessments (like IAMA).
Transparency, we think, can be fostered by sharing reports from such a tool in a standardized way.
There are several existing open source tools which we have assessed. Some support only assessments, others already combine more features and can generate a report. There is however no tool that supports all the requirements we have.
These are our main requirements of our tool:
We will build our own solution. Where possible this solution should be able to re-use certain components of other related open-source projects.
"},{"location":"projects/tad/adrs/0003-tad-tool/#risks","title":"Risks","text":"We can develop a solution that is tailored to the needs of our stakeholders.
"},{"location":"projects/tad/adrs/0004-software-stack/","title":"TAD-0004 Software Stack for TAD","text":""},{"location":"projects/tad/adrs/0004-software-stack/#context","title":"Context","text":"For building our own TAD solution, we need to choose a software stack. During our earlier POCs and market research, we gathered insight and information on technologies to use and which not to use.
During further discussions and brainstorm sessions, a software stack was chosen that accommodates our needs best.
While more fine grained requirements are listed elsewhere, some key requirements are:
We stick to suitable programming languages. As most AI related tooling is written in Python, this language is the logical choice for our development as well.
Currently we do not see the need for a separate web GUI framework. it is preferred to bundle backend and frontend in one solution.
As part of a Dutch government organization, we need to adhere to all dutch laws and standards, like:
We will support the latest 3 minor version of Python v3 as programming language and Poetry for dependency management.
"},{"location":"projects/tad/adrs/0004-software-stack/#backend","title":"Backend","text":"The Python backend will use the following key dependencies:
We will use serverside rendering of HTML, based on HTMX. For styling and components we will use NL Design System.
"},{"location":"projects/tad/adrs/0004-software-stack/#testing","title":"Testing","text":"We will use pytest for unit-testing and VCRPY and Playwright for module and integration tests.
"},{"location":"projects/tad/adrs/0004-software-stack/#database","title":"Database","text":"We will use SQLModel or SQL Alchemy with SQLite for development and postgreSQL for production.
"},{"location":"projects/tad/adrs/0004-software-stack/#risks","title":"Risks","text":"As HTMX is relatively more limited than other UI frameworks, it may lack features we require but did not anticipate.
"},{"location":"projects/tad/adrs/0004-software-stack/#consequences","title":"Consequences","text":"We have clarity about the tools to use and develop our TAD tool.
"},{"location":"projects/tad/adrs/0005-ai-verify-technical-tests/","title":"TAD-0005 Add support to run technical tests via AI Verify","text":""},{"location":"projects/tad/adrs/0005-ai-verify-technical-tests/#context","title":"Context","text":"The AI Verify project is set up in a modular way, and the technical tests is one of the modules. The AI Verify team is developing a feature which makes it possible to run the technical tests using an API: a Python library with a method to run a test and providing the required configuration; for example, which model and dataset to use and some test specific configuration.
The result of the test are returned in a JSON format, which can be processed in any way we please, like writing it to a file or System Card or store it in a database.
"},{"location":"projects/tad/adrs/0005-ai-verify-technical-tests/#pros","title":"Pros","text":"Our technical tests will include, but may extend beyond, those offered by AI Verify.
"},{"location":"projects/tad/adrs/0005-ai-verify-technical-tests/#risks","title":"Risks","text":"The tests we use from AI Verify are tied to the AI Verify ecosystem. So it uses their (core) modules to load models and datasets. Adding support for other models or data formats, like models written in R, has to be done in the AI Verify core.
"},{"location":"projects/tad/adrs/0005-ai-verify-technical-tests/#consequences","title":"Consequences","text":"We have a set of technical tests we can integrate in the TAD tool.
"},{"location":"projects/tad/adrs/0006-extend-system-card-EU-AI-Act/","title":"TAD-0006 Include EU AI Act into System Card","text":""},{"location":"projects/tad/adrs/0006-extend-system-card-EU-AI-Act/#context","title":"Context","text":"The European Union AI Act represents a landmark regulatory framework aimed at ensuring the safe and ethical development and deployment of artificial intelligence technologies within the EU. It defines different policies and requirements for AI systems based on their risk levels, from minimal to unacceptable, to mitigate potential harms. Only for high-risk AI systems, an extended form of documentation is required, including technical documentation. This technical documentation consists of a general description of the AI system and a more detailed, in-depth description (including risk-management, monitoring, etc.).
To ensure that AI systems can be effectively audited, we aim to create a separate instrument called 'technical documentation for high-risk AI systems'. This will allow developers to easily extract and auditors to readily assess all necessary information for the technical documentation.
The RegCheck AI tool published by Hugging Face, checks model cards for compliance with the EU AI Act. However, this prototype tool is research work and not a commercial or legal product. Furthermore, because we use a modified model card setup, the performance may be less reliable.
"},{"location":"projects/tad/adrs/0006-extend-system-card-EU-AI-Act/#assumptions","title":"Assumptions","text":"The extended system card and proposed instrument will facilitate the documentation of information in accordance with the EU AI Act using the TAD tool.
"},{"location":"projects/tad/existing-tools/checklists/ai_assesment_tool_checklist/","title":"ALTAI","text":"See the introduction. It is a discussion tool about AI Systems.
"},{"location":"projects/tad/existing-tools/checklists/ai_assesment_tool_checklist/#functionality","title":"Functionality","text":"Requirement Priority Fulfilled Comments The tool allows users to conduct technical tests on algorithms or models, including assessments of performance, bias, and fairness. To facilitate these tests, users can input relevant datasets, M 0 The tool only allows for discussions not technical tests The tool allows users to choose which tests to perform. M 0 See above The tool allows users to fill out questionnaires to conduct impact assessments for AI. For example IAMA or ALTAI. M 1 This is very well supported by the tool The tool can generate a human readable report. M 0.9 There is an export functionality for the outcomes of the assessment, it offers a print dialog The tools works with a standardized report format, that it can read, write, and update. M 0 This report cannot be re-imported in a different tool as it only exports to pdf The tool supports plugin functionality so additional tests can be added easily. S 0 Not applicable The tool allows to create custom reports based on components. S 0 The report cannot be customized by the user It is possible to add custom components for reports. S 0 See above The tool provides detailed logging, including tracking of different model versions, changes in impact assessments, and technical test results for individual runs. S 0.75 There is even for the users an extensive audit trail what happened to assessment, not different model versions The tool supports saving progress. S 1 Yes this is supported The tool can be used on an isolated system without an internet connection. S 1 Yes it can be ran locally or in a docker container without internet The tool offers options to discuss and document conversations. For example, to converse about technical tests or to collaborate on impact assessments. C 1 This is the main feature of the tool The tool operates with complete data privacy; it does not share any data or logging information. C 1 Stored locally in a mongoDB The tool allows extension of report formats functionality. C 0.5 It could be developed that we export to markdown instead of pdf, but right now it just prints the window as pdf The tool can be integrated in a CI/CD flow. C 0 It is an UI tool, so doesn't make sense in a CI/CD pipeline The tool can be offered as a (cloud) service where no local installation is required. C 1 We could host this tool for other parties to use It is possible to define and automate workflows for repetitive tasks. C 0 It is an UI tool The tool offers pre-built connectors or low-code/no-code integration options to simplify the integration process. C 0 Nototal_score = 22.85
"},{"location":"projects/tad/existing-tools/checklists/ai_assesment_tool_checklist/#reliability","title":"Reliability","text":"Requirement Priority Fulfilled Comments The tool operates consistently and reliably, meaning it delivers the same expected results every time you use it. M 1 Yes The tool recovers automatically from common failures. S 1 The tool seems too do this The tool recovers from failures quickly, minimizing data loss, for example by automatically saving intermediate test progress results. S 1 The data is stored in mongoDB, so no data is lost The tool handles errors gracefully and informs users of any issues. S 1 If the email server is down the tool still operates The tool provides clear error messages and instructions for troubleshooting. S 0.8 Some errors are not very informative when you get them, but mostly email related aretotal_score = 15.4
"},{"location":"projects/tad/existing-tools/checklists/ai_assesment_tool_checklist/#usability","title":"Usability","text":"Requirement Priority Fulfilled Comments The tool possess a clean, intuitive, and visually appealing UI that follows industry standards. S 1 Very clean UI The tool provides clear and consistent navigation, making it easy for users to find what they need. S 1 Compared to AIVerify the navigation is very intuitive (but it also has less features) The tool is responsive and provides instant feedback. S 1 Yes The user interface is multilingual and supports at least English. S 0.8 There is support for multilingual, but the assessments are not translated and needs to be translated by hand The tool offers keyboard shortcuts for efficient interaction. C 0 No The user interface can easily be translated into other languages. C 0.8 The buttons are automatically translated but not the assessment itselftotal_score = 13
"},{"location":"projects/tad/existing-tools/checklists/ai_assesment_tool_checklist/#help-documentation","title":"Help & Documentation","text":"Requirement Priority Fulfilled Comments The tool provides comprehensive online help documentation with searchable functionalities. S 0.1 There is little documentation, only the website and the github readme The tool offers context-sensitive help within the application. C 0 The icons are just very clear, would be nice to have a question mark at some places The online documentation includes video tutorials and training materials for ease of learning. C 0 There is no such documentation The project provides readily available customer support through various channels (e.g., email, phone, online chat) to address user inquiries and troubleshoot issues. C 0.25 You can issue tickets on Github, no other way supported waytotal_score = 0.55
"},{"location":"projects/tad/existing-tools/checklists/ai_assesment_tool_checklist/#performance-efficiency","title":"Performance Efficiency","text":"Requirement Priority Fulfilled Comments The tool operates efficiently and minimize resource utilization. M 1 The docker container is not so very big, also doesn't use much resources The tool responds to user actions instantly. M 1 There is instant feedback in the UI The tool is scalable to accommodate increased user base and data volume. S 1 As it runs on Docker, you can scale this on Kubernetes for multiple userstotal_score = 11
"},{"location":"projects/tad/existing-tools/checklists/ai_assesment_tool_checklist/#maintainability","title":"Maintainability","text":"Requirement Priority Fulfilled Comments The tool is easy to modify and maintain. M 0.8 You need to be a bit aware of NextJS, then it is easy to maintain as it is not such a large tool The tool adheres to industry coding standards and best practices to ensure code quality and maintainability. M 0.8 The code looks well structured, they have deployments on github but I don't see any CI or pre-commit hooks The code is written in a common, widely adopted and supported and actively used and maintained programming language. M 1 NextJS is very common for frontend tools The project provides version control for code changes and rollback capabilities. M 1 The code is hosted on Github so yes The project is open source. M 1 see above It is possible to contribute to the source. S 1 It is possible, not many people have done this yet The system is modular, allowing for easy modification of individual components. S 0.6 Extra assessments can be appended to the system, but not in such a way that it supports multiple (different) assessments, but roles can be changed very easily Diagnostic tools are available to identify and troubleshoot issues. S 0.8 The standard NextJS tools to troubleshoot, but not many teststotal_score = 25.6
"},{"location":"projects/tad/existing-tools/checklists/ai_assesment_tool_checklist/#security","title":"Security","text":"Requirement Priority Fulfilled Comments The tool must protect data and system from unauthorized access, use, disclosure, disruption, modification, or destruction. M 1 The data is stored in MongoDB Regular security audits and penetration testing are conducted. S 0 When running docker compose up, the docker client will tell there are quite some CVE vulnerabilities in there, an upgrade of the Node version would help much here The tool enforce authorization controls based on user roles and permissions, restricting access to sensitive data and functionalities. C 0.5 The tool has support for multiple users and roles (but we couldn't find a user management system) Data encryption is used for sensitive information at rest and in transit. C 1 When data is transferred to mongoDB, a secure connection is set-up and also in the DB it is encrypted by MongoDB, also you have an SSL connection with the tool The project allows for regular security audits and penetration testing to identify vulnerabilities and ensure system integrity. C 1 The tool does allow this, as it is open-source The tool implements backup functionality to ensure data availability in case of incidents. C 1 The data is store in a volume next to the main container of thetotal_score = 7.5
"},{"location":"projects/tad/existing-tools/checklists/ai_assesment_tool_checklist/#compatibility","title":"Compatibility","text":"Requirement Priority Fulfilled Comments The tool is compatible with existing systems and infrastructure. M 1 As it is a container it can run on Kubernetes and therefore at Digilab The tool supports industry-standard data formats and protocols. M 1 Assessment and other config are stored in JSON The tool operates seamlessly on supported operating systems and hardware platforms. S 1 As it runs in a container it is able to run on all the major OSes if you have Docker Desktop or use a cloud version managed by yourself The tool supports commonly used data formats (e.g., CSV, Excel, JSON) for easy data exchange with other systems and tools. S 0 The tool currently only exports a pdf which is not an exchangeable format The tool integrates with existing security solutions. C 0 Not applicable as it is an UItotal_score = 11
"},{"location":"projects/tad/existing-tools/checklists/ai_assesment_tool_checklist/#accessibility","title":"Accessibility","text":"Requirement Priority Fulfilled Comments The tool is accessible to users with disabilities, following relevant accessibility standards (e.g., WCAG). S 0.1 The color scheme is pretty good viewable, but for the rest there are not accessibility featurestotal_score = 0.3
"},{"location":"projects/tad/existing-tools/checklists/ai_assesment_tool_checklist/#portability","title":"Portability","text":"Requirement Priority Fulfilled Comments The tool support a range of operating systems (e.g., Windows, macOS, Linux) commonly used within an organization. S 1 It is in docker so can run everywhere The tool minimizes dependencies on specific hardware or software configurations, promoting flexibility. S 1 This is all containerized The tool offers a cloud-based deployment option or be compatible with cloud environments for scalability and accessibility. S 1 As it is containerized we could host this ourselves in a cloud environment, the Belgium government does not offer a hosted version for you The tool adheres to relevant cloud security standards and best practices. S 0.8 The docker container does contain some outdated versions of for example Node.total_score = 11.4
"},{"location":"projects/tad/existing-tools/checklists/ai_assesment_tool_checklist/#deployment","title":"Deployment","text":"Requirement Priority Fulfilled Comments The tool has an easy and user-friendly installation and configuration process. S 1 It was very easy to install out-of-the-box The tool has on-premise or cloud-based deployment options to cater to different organizational needs and infrastructure. S 0 The tool does not promise on-prem or cloud-based managed deploymentstotal_score = 3
"},{"location":"projects/tad/existing-tools/checklists/ai_assesment_tool_checklist/#legal-compliance","title":"Legal & Compliance","text":"Requirement Priority Fulfilled Comments It is clear how the tool is funded to avoid improper influence due to conflicts of interest M 1 It is funded by the Belgian Government The tool is compliant with relevant legal and regulatory requirements. S 1 Yes EU license The tool adheres to (local) data privacy regulations like GDPR, ensuring the protection of user data. S 1 Data is stored locally The tool implements appropriate security measures to comply with industry regulations and standards. S 1 EUPL 1.2 license (although they say they have MIT license) The tool is licensed for use within the organization according to the terms and conditions of the license agreement. S 1 Yes, see above The tool respects intellectual property rights and avoid copyright infringement issues. S 1 Yes, see abovetotal_score = 19
"},{"location":"projects/tad/existing-tools/checklists/aiverify_checklist/","title":"AI Verify","text":"See the introduction
"},{"location":"projects/tad/existing-tools/checklists/aiverify_checklist/#functionality","title":"Functionality","text":"Requirement Priority Fulfilled Comments The tool allows users to conduct technical tests on algorithms or models, including assessments of performance, bias, and fairness. To facilitate these tests, users can input relevant datasets, M 1 This is core functionality of AIVerify The tool allows users to choose which tests to perform. M 1 This is core functionality of AIVerify The tool allows users to fill out questionnaires to conduct impact assessments for AI. For example IAMA or ALTAI. M 1 This is core functionality of AIVerify, however work is needed to add extra impact assessments The tool can generate a human readable report. M 1 This is core functionality of AIVerify The tools works with a standardized report format, that it can read, write, and update. M 0 The outputted format is a PDF format, so this cannot be updated, or easily read by a machine. The tool supports plugin functionality so additional tests can be added easily. S 0.5 One can add a test as a plugin, it can however be a bit too technical still for many people. The tool allows to create custom reports based on components. S 1 One can slide the technical tests results and the assessment test results into a report which will be placed into a PDF It is possible to add custom components for reports. S 1 It is possible, but just like with tests can be hard for non-technical people The tool provides detailed logging, including tracking of different model versions, changes in impact assessments, and technical test results for individual runs. S 0.5 There are versions of models when uploaded, and the report itself is the technical test result of a run. Changes to impact assessments are not logged (only when a report is generated) The tool supports saving progress. S 1 Reports can be saved, while it is being constructed The tool can be used on an isolated system without an internet connection. S 1 Locally the docker container can be build and ran The tool offers options to discuss and document conversations. For example, to converse about technical tests or to collaborate on impact assessments. C 0 Only the end-result will be logged into the report The tool operates with complete data privacy; it does not share any data or logging information. C 1 The application is a docker application and does not do this The tool allows extension of report formats functionality. C 1 We could program this functionality in the tool and submit a PR The tool can be integrated in a CI/CD flow. C 0.5 It is possible, but would be very heavy to do so. The build time is quite large, and only the technical tests could be ran in an automated fashion The tool can be offered as a (cloud) service where no local installation is required. C 0 AIVerify is currently not doing this, we could however offer it as a cloud service It is possible to define and automate workflows for repetitive tasks. C 0 As this tool is focused on UI, this is not possible The tool offers pre-built connectors or low-code/no-code integration options to simplify the integration process. C 0 This is not includedtotal_score = 36
"},{"location":"projects/tad/existing-tools/checklists/aiverify_checklist/#reliability","title":"Reliability","text":"Requirement Priority Fulfilled Comments The tool operates consistently and reliably, meaning it delivers the same expected results every time you use it. M 1 The tool did not break down a single time while we were coding a plugin (only threw errors) The tool recovers automatically from common failures. S 1 Common failures like missing datasets or models are not breaking The tool recovers from failures quickly, minimizing data loss, for example by automatically saving intermediate test progress results. S 0.5 The assessments you need to manually save otherwise it will be lost, but over different sessions the data will be stored persistent even if the containers go down. Test results are only stored in the generated report The tool handles errors gracefully and informs users of any issues. S 1 When failed to generate a report the tool will log the error messages, otherwise when loading in data that is non existing the application (while not being very clear in error message) just continues with an error The tool provides clear error messages and instructions for troubleshooting. S 0.5 The test-engine-core is a dependency that is installed as a package, and therefore the error message will not contain error in that packagetotal_score = 13
"},{"location":"projects/tad/existing-tools/checklists/aiverify_checklist/#usability","title":"Usability","text":"Requirement Priority Fulfilled Comments The tool possess a clean, intuitive, and visually appealing UI that follows industry standards. S 1 The tool does follow the material design principles for example when you hover over items they will respond to user input The tool provides clear and consistent navigation, making it easy for users to find what they need. S 0.5 It is not completely clear where in the tool you are when interacting with it and sometimes you could go back to home but not always The tool is responsive and provides instant feedback. S 1 Even for jobs like generating tests and the report, it scheduled jobs and will notify you when it is done The user interface is multilingual and supports at least English. S 0.5 Currently it only supports english The tool offers keyboard shortcuts for efficient interaction. C 0 It is mainly UI and therefore no keyboard shortcuts The user interface can easily be translated into other languages. C 0.2 It would need quite some refactoring when adding support for the Dutch Language (especially the more technical words like Warning or the metadata on all the pluginstotal_score = 9.4
"},{"location":"projects/tad/existing-tools/checklists/aiverify_checklist/#help-documentation","title":"Help & Documentation","text":"Requirement Priority Fulfilled Comments The tool provides comprehensive online help documentation with searchable functionalities. S 0.8 From the end-user perspective yes, from the development perspective no (for example that you need to rebuild packages like the test-engine-core The tool offers context-sensitive help within the application. C 0 Not included in the tool The online documentation includes video tutorials and training materials for ease of learning. C 0 Although it contains many images The project provides readily available customer support through various channels (e.g., email, phone, online chat) to address user inquiries and troubleshoot issues. C 0.2 Just email, which they do not respond to very quicklytotal_score = 2.8
"},{"location":"projects/tad/existing-tools/checklists/aiverify_checklist/#performance-efficiency","title":"Performance Efficiency","text":"Requirement Priority Fulfilled Comments The tool operates efficiently and minimize resource utilization. M 0.5 The tool is efficient, minimal waiting and no lag although it uses up quite some resources which could be optimized The tool responds to user actions instantly. M 1 Instantaneous response time The tool is scalable to accommodate increased user base and data volume. S 0.5 As it is built into a container it can be made scalable with Kubernetes, but the the tool itself can become very slow when generating results for a large dataset and model (because of the extra overhead)total_score = 7.5
"},{"location":"projects/tad/existing-tools/checklists/aiverify_checklist/#maintainability","title":"Maintainability","text":"Requirement Priority Fulfilled Comments The tool is easy to modify and maintain. M 0.2 Adding a new plugin for a model type was quite hard, other plugins however are more easier The tool adheres to industry coding standards and best practices to ensure code quality and maintainability. M 0.2 The docker side of the project could have a big improvement The code is written in a common, widely adopted and supported and actively used and maintained programming language. M 1 Backend in Python, Frontend in NextJs The project provides version control for code changes and rollback capabilities. M 0.8 The code is stored on Github, but the container itself not and also the packages which the tools depend on not The project is open source. M 1 Github link It is possible to contribute to the source. S 0.5 It is possible, although with our three features it takes a while for them to dedicated time for integration The system is modular, allowing for easy modification of individual components. S 0.5 The technical tests and assessments are easy to adjust, other core features not Diagnostic tools are available to identify and troubleshoot issues. S 0 Diagnosing some parts of the system took us quite some time as we couldn't properly debug in the containerstotal_score = 15.8
"},{"location":"projects/tad/existing-tools/checklists/aiverify_checklist/#security","title":"Security","text":"Requirement Priority Fulfilled Comments The tool must protect data and system from unauthorized access, use, disclosure, disruption, modification, or destruction. M 0.5 This managed by that the data is stored in MongoDB however, it currently only has 1 user support Regular security audits and penetration testing are conducted. S 0.1 We are unaware of the security audits but they do have a security policy here The tool enforce authorization controls based on user roles and permissions, restricting access to sensitive data and functionalities. C 0 Currently only 1 user can use the system and see all the data Data encryption is used for sensitive information at rest and in transit. C 1 When data is transferred to mongoDB, a secure connection is set-up and also in the DB it is encrypted by MongoDB, also you have an SSL connection with the tool The project allows for regular security audits and penetration testing to identify vulnerabilities and ensure system integrity. C 1 As you can install it locally, this is possible The tool implements backup functionality to ensure data availability in case of incidents. C 1 Data is stored persistent, so even if the tool breaks the data will be in volumestotal_score = 8.3
"},{"location":"projects/tad/existing-tools/checklists/aiverify_checklist/#compatibility","title":"Compatibility","text":"Requirement Priority Fulfilled Comments The tool is compatible with existing systems and infrastructure. M 1 As it is a container it can run on Kubernetes and therefore at Digilab The tool supports industry-standard data formats and protocols. M 1 Most Datasets and Models are supported by the tool The tool operates seamlessly on supported operating systems and hardware platforms. S 1 As it runs in a container it is able to run on all the major OS'es if you have Docker Desktop or use a cloud version managed by yourself The tool supports commonly used data formats (e.g., CSV, Excel, JSON) for easy data exchange with other systems and tools. S 0.5 As input many types are accepted, but only as export there is a PDF report The tool integrates with existing security solutions. C 0 It does not integrate with security solutionstotal_score = 12.5
"},{"location":"projects/tad/existing-tools/checklists/aiverify_checklist/#accessibility","title":"Accessibility","text":"Requirement Priority Fulfilled Comments The tool is accessible to users with disabilities, following relevant accessibility standards (e.g., WCAG). S 0 It is not clear what the tool actually does with one look, also the color change when hovering over elements is not a large difference compared to the original color (the purple and pink)total_score = 0
"},{"location":"projects/tad/existing-tools/checklists/aiverify_checklist/#portability","title":"Portability","text":"Requirement Priority Fulfilled Comments The tool support a range of operating systems (e.g., Windows, macOS, Linux) commonly used within an organization. S 1 It is containerized The tool minimizes dependencies on specific hardware or software configurations, promoting flexibility. S 1 This is all containerized The tool offers a cloud-based deployment option or be compatible with cloud environments for scalability and accessibility. S 1 As it is containerized we could host this ourselves in a cloud environment The tool adheres to relevant cloud security standards and best practices. S 0.5 The making of the container it self is lacking some best practices, otherwise the cloud security standards are not applicable as it is a self-hosted tooltotal_score = 10.5
"},{"location":"projects/tad/existing-tools/checklists/aiverify_checklist/#deployment","title":"Deployment","text":"Requirement Priority Fulfilled Comments The tool has an easy and user-friendly installation and configuration process. S 0.5 You need to be technical to be able to install and deploy, but then it is relatively easy The tool has on-premise or cloud-based deployment options to cater to different organizational needs and infrastructure. S 0 The tool does not promise on-prem or cloud-based managed deploymentstotal_score = 1.5
"},{"location":"projects/tad/existing-tools/checklists/aiverify_checklist/#legal-compliance","title":"Legal & Compliance","text":"Requirement Priority Fulfilled Comments It is clear how the tool is funded to avoid improper influence due to conflicts of interest M 1 On the website it is stated, that many commercial partners fund this project The tool is compliant with relevant legal and regulatory requirements. S 1 The tool adheres to (local) data privacy regulations like GDPR, ensuring the protection of user data. S 1 The tool implements appropriate security measures to comply with industry regulations and standards. S 1 The tool is licensed for use within the organization according to the terms and conditions of the license agreement. S 1 Apache 2.0 license The tool respects intellectual property rights and avoid copyright infringement issues. S 1total_score = 19
"},{"location":"projects/tad/existing-tools/checklists/holisticai_checklist/","title":"Holistic AI","text":"See the introduction. It is a toolkit just like IBM-360-Toolkit for a data scientist to research bias and also to mitigate it immediately.
"},{"location":"projects/tad/existing-tools/checklists/holisticai_checklist/#functionality","title":"Functionality","text":"Requirement Priority Fulfilled Comments The tool allows users to conduct technical tests on algorithms or models, including assessments of performance, bias, and fairness. To facilitate these tests, users can input relevant datasets, M 1 The tests which can be executed are written here The tool allows users to choose which tests to perform. M 1 In code the user is free to choose any test The tool allows users to fill out questionnaires to conduct impact assessments for AI. For example IAMA or ALTAI. M 0 The tool only does technical tests The tool can generate a human readable report. M 0 The toolkit itself cannot make a human readable report, it only generates results which then needs to be interpreted The tools works with a standardized report format, that it can read, write, and update. M 0 The only format it outputs are specific numbers, so no standardized format or even een report format The tool supports plugin functionality so additional tests can be added easily. S 0 All the bias tests are put in a single script which making additional tests a bit cumbersome and leas developer-friendly The tool allows to create custom reports based on components. S 0 Does not allow reports export It is possible to add custom components for reports. S 0 Does not allow reports export The tool provides detailed logging, including tracking of different model versions, changes in impact assessments, and technical test results for individual runs. S 0 Not ouf of the box, but this could be written in code by the owner of the algorithm The tool supports saving progress. S 0 Not ouf of the box, but this could be written in code by the owner of the algorithm The tool can be used on an isolated system without an internet connection. S 1 As a python tool this is possible The tool offers options to discuss and document conversations. For example, to converse about technical tests or to collaborate on impact assessments. C 0 This is not supported The tool operates with complete data privacy; it does not share any data or logging information. C 1 The local tool does not share anything to the outside world The tool allows extension of report formats functionality. C 0 This is not what the tool is built for The tool can be integrated in a CI/CD flow. C 1 As it is a python package it can be included in a CI pipeline The tool can be offered as a (cloud) service where no local installation is required. C 0 Not immediately, an UI needs to be build around it It is possible to define and automate workflows for repetitive tasks. C 1 Automated tests could be programmed specifically from this tool The tool offers pre-built connectors or low-code/no-code integration options to simplify the integration process. C 0 Not supported by the tooltotal_score = 17
"},{"location":"projects/tad/existing-tools/checklists/holisticai_checklist/#reliability","title":"Reliability","text":"Requirement Priority Fulfilled Comments The tool operates consistently and reliably, meaning it delivers the same expected results every time you use it. M 1 The tool recovers automatically from common failures. S 1 The tool recovers from failures quickly, minimizing data loss, for example by automatically saving intermediate test progress results. S 1 The tool handles errors gracefully and informs users of any issues. S 1 The tool provides clear error messages and instructions for troubleshooting. S 1total_score = 16
"},{"location":"projects/tad/existing-tools/checklists/holisticai_checklist/#usability","title":"Usability","text":"Requirement Priority Fulfilled Comments The tool possess a clean, intuitive, and visually appealing UI that follows industry standards. S 0 There is no user-interface The tool provides clear and consistent navigation, making it easy for users to find what they need. S 0 There is no user-interface The tool is responsive and provides instant feedback. S 0 There is no user-interface The user interface is multilingual and supports at least English. S 0 There is no user-interface The tool offers keyboard shortcuts for efficient interaction. C 0 There is no user-interface The user interface can easily be translated into other languages. C 0 There is no user-interfacetotal_score = 0
"},{"location":"projects/tad/existing-tools/checklists/holisticai_checklist/#help-documentation","title":"Help & Documentation","text":"Requirement Priority Fulfilled Comments The tool provides comprehensive online help documentation with searchable functionalities. S 0.2 There is some documentation but it is not very helpful The tool offers context-sensitive help within the application. C 0 As a Python tool, no The online documentation includes video tutorials and training materials for ease of learning. C 0 Ths is not there The project provides readily available customer support through various channels (e.g., email, phone, online chat) to address user inquiries and troubleshoot issues. C 0.5 You can contact sales through their website and respond on Github, Github seems to be an okay response time (but not a large community)total_score = 1.6
"},{"location":"projects/tad/existing-tools/checklists/holisticai_checklist/#performance-efficiency","title":"Performance Efficiency","text":"Requirement Priority Fulfilled Comments The tool operates efficiently and minimize resource utilization. M 1 very lightweight as a python package The tool responds to user actions instantly. M 1 It will return output instantly The tool is scalable to accommodate increased user base and data volume. S 1 This would be installed distributed and therefore would be scalable, with large datasets it is still very quicktotal_score = 11
"},{"location":"projects/tad/existing-tools/checklists/holisticai_checklist/#maintainability","title":"Maintainability","text":"Requirement Priority Fulfilled Comments The tool is easy to modify and maintain. M 0.5 It is less modular because most of the tests are written in a single script The tool adheres to industry coding standards and best practices to ensure code quality and maintainability. M 0.5 They use pre-commit hooks, but the codebase seems to be a bit weirdly structured The code is written in a common, widely adopted and supported and actively used and maintained programming language. M 1 It is written in Python The project provides version control for code changes and rollback capabilities. M 1 It is hosted on Github The project is open source. M 1 Hosted here It is possible to contribute to the source. S 1 It is possible and they respond to contributions The system is modular, allowing for easy modification of individual components. S 0.5 See the first point Diagnostic tools are available to identify and troubleshoot issues. S 1 Just standard python troubleshooting toolstotal_score = 23.5
"},{"location":"projects/tad/existing-tools/checklists/holisticai_checklist/#security","title":"Security","text":"Requirement Priority Fulfilled Comments The tool must protect data and system from unauthorized access, use, disclosure, disruption, modification, or destruction. M 0 Not applicable Regular security audits and penetration testing are conducted. S 0 It is not stated on the repository that they do something with security The tool enforce authorization controls based on user roles and permissions, restricting access to sensitive data and functionalities. C 0 The tool does not have Users or Access control Data encryption is used for sensitive information at rest and in transit. C 0 Transitionary data is not stored The project allows for regular security audits and penetration testing to identify vulnerabilities and ensure system integrity. C 1 This is not blocked by the tool The tool implements backup functionality to ensure data availability in case of incidents. C 0 Not supportedtotal_score = 2
"},{"location":"projects/tad/existing-tools/checklists/holisticai_checklist/#compatibility","title":"Compatibility","text":"Requirement Priority Fulfilled Comments The tool is compatible with existing systems and infrastructure. M 1 It can be imported in Python The tool supports industry-standard data formats and protocols. M 0 it does not standardize at all in the output of the tests The tool operates seamlessly on supported operating systems and hardware platforms. S 1 Python can be ran on any system The tool supports commonly used data formats (e.g., CSV, Excel, JSON) for easy data exchange with other systems and tools. S 1 If it can be imported in Python/R it is supported The tool integrates with existing security solutions. C 0 Not applicabletotal_score = 10
"},{"location":"projects/tad/existing-tools/checklists/holisticai_checklist/#accessibility","title":"Accessibility","text":"Requirement Priority Fulfilled Comments The tool is accessible to users with disabilities, following relevant accessibility standards (e.g., WCAG). S 0 You need to be a programmer to use it, and that is not your typical user with disabilitiestotal_score = 0
"},{"location":"projects/tad/existing-tools/checklists/holisticai_checklist/#portability","title":"Portability","text":"Requirement Priority Fulfilled Comments The tool support a range of operating systems (e.g., Windows, macOS, Linux) commonly used within an organization. S 0.5 As it is a python tool it is supported anywhere python runs The tool minimizes dependencies on specific hardware or software configurations, promoting flexibility. S 1 It is a python tool The tool offers a cloud-based deployment option or be compatible with cloud environments for scalability and accessibility. S 1 The company behind Holistic AI offers a whole range of services included an UI which uses this open-source toolkit The tool adheres to relevant cloud security standards and best practices. S 0 On their website they do not speak about where the data of their solution will go, this is not very transparenttotal_score = 7.5
"},{"location":"projects/tad/existing-tools/checklists/holisticai_checklist/#deployment","title":"Deployment","text":"Requirement Priority Fulfilled Comments The tool has an easy and user-friendly installation and configuration process. S 0.2 You need to have some developer knowledge and also knowledge about the technical tests to use The tool has on-premise or cloud-based deployment options to cater to different organizational needs and infrastructure. S 1 Yes the tool can be used as a cloud-based deployment but then with a whole UI around ittotal_score = 3.6
"},{"location":"projects/tad/existing-tools/checklists/holisticai_checklist/#legal-compliance","title":"Legal & Compliance","text":"Requirement Priority Fulfilled Comments It is clear how the tool is funded to avoid improper influence due to conflicts of interest M 1 The tool is owned by a private company but has been made open source to the public The tool is compliant with relevant legal and regulatory requirements. S 1 Under the apache 2.0 license The tool adheres to (local) data privacy regulations like GDPR, ensuring the protection of user data. S 1 Data stays locally The tool implements appropriate security measures to comply with industry regulations and standards. S 0 The repo does not speak about security at all The tool is licensed for use within the organization according to the terms and conditions of the license agreement. S 1 Under the apache 2.0 license The tool respects intellectual property rights and avoid copyright infringement issues. S 1total_score = 16
"},{"location":"projects/tad/existing-tools/checklists/ibm_360_research_toolkit_checklist/","title":"IBM Research 360 Toolkit","text":"See the introduction, same thing as verifyML this has no frontend baked in, but has some nice integrations with MLops tooling like Kubeflow Pipelines. The IBM Research 360 toolkit is actually a collection of three open-source toolkits as stated by their Github repo; AI Fairness 360, AI Explainability 360, Adversarial Robustness 360. The strong suite of this toolkit that it considers bias in the whole lifecycle of the model; (dataset, training, output).
"},{"location":"projects/tad/existing-tools/checklists/ibm_360_research_toolkit_checklist/#functionality","title":"Functionality","text":"Requirement Priority Fulfilled Comments The tool allows users to conduct technical tests on algorithms or models, including assessments of performance, bias, and fairness. To facilitate these tests, users can input relevant datasets, M 1 Fairness, Explainability and security can be tested with the suite of tools The tool allows users to choose which tests to perform. M 1 The websites of contain a whole explanation of which tests to perform AIF Website, AIX website, ART website The tool allows users to fill out questionnaires to conduct impact assessments for AI. For example IAMA or ALTAI. M 0 The tool only does technical tests The tool can generate a human readable report. M 0 The toolkit itself cannot make a human readable report, it only generates results which then needs to be interpreted The tools works with a standardized report format, that it can read, write, and update. M 0 The only format it outputs are specific numbers, so no standardized format or even een report format The tool supports plugin functionality so additional tests can be added easily. S 1 Only the repository new tests could be added quite easily if you understand Python The tool allows to create custom reports based on components. S 0 The tool does not generate reports It is possible to add custom components for reports. S 0 The tool does not generate reports The tool provides detailed logging, including tracking of different model versions, changes in impact assessments, and technical test results for individual runs. S 0 Not ouf of the box, but this could be written in code by the owner of the algorithm The tool supports saving progress. S 0 Not ouf of the box, but this could be written in code by the owner of the algorithm The tool can be used on an isolated system without an internet connection. S 1 As it can be imported as a python or r library The tool offers options to discuss and document conversations. For example, to converse about technical tests or to collaborate on impact assessments. C 0 This is not supported, there is no UI The tool operates with complete data privacy; it does not share any data or logging information. C 1 The tool does not share data The tool allows extension of report formats functionality. C 0 The tool does not generate reports The tool can be integrated in a CI/CD flow. C 1 As it is a programming toolkit it can be used in a CI/CD pipeline The tool can be offered as a (cloud) service where no local installation is required. C 0 not immediately, then an UI needs to be made It is possible to define and automate workflows for repetitive tasks. C 1 We could automate specific tests which we deem necessary or standard The tool offers pre-built connectors or low-code/no-code integration options to simplify the integration process. C 0 Purely written in Pythontotal_score = 20
"},{"location":"projects/tad/existing-tools/checklists/ibm_360_research_toolkit_checklist/#reliability","title":"Reliability","text":"Requirement Priority Fulfilled Comments The tool operates consistently and reliably, meaning it delivers the same expected results every time you use it. M 1 The tool recovers automatically from common failures. S 1 The tool recovers from failures quickly, minimizing data loss, for example by automatically saving intermediate test progress results. S 1 The tool handles errors gracefully and informs users of any issues. S 1 The tool provides clear error messages and instructions for troubleshooting. S 1total_score = 16
"},{"location":"projects/tad/existing-tools/checklists/ibm_360_research_toolkit_checklist/#usability","title":"Usability","text":"Requirement Priority Fulfilled Comments The tool possess a clean, intuitive, and visually appealing UI that follows industry standards. S 0 There is no user-interface The tool provides clear and consistent navigation, making it easy for users to find what they need. S 0 There is no user-interface The tool is responsive and provides instant feedback. S 0 There is no user-interface The user interface is multilingual and supports at least English. S 0 There is no user-interface The tool offers keyboard shortcuts for efficient interaction. C 0 There is no user-interface The user interface can easily be translated into other languages. C 0 There is no user-interfacetotal_score = 0
"},{"location":"projects/tad/existing-tools/checklists/ibm_360_research_toolkit_checklist/#help-documentation","title":"Help & Documentation","text":"Requirement Priority Fulfilled Comments The tool provides comprehensive online help documentation with searchable functionalities. S 0.8 On the website of the specific toolkit you can find many docs but you cannot search The tool offers context-sensitive help within the application. C 0 Within the application (as it is not an UI, does not offer specific help) The online documentation includes video tutorials and training materials for ease of learning. C 1 The amount of tutorials is extensive even videos of its usage The project provides readily available customer support through various channels (e.g., email, phone, online chat) to address user inquiries and troubleshoot issues. C 1 You can ask questions at the repository, but also in slack and many people are using thistotal_score = 6.4
"},{"location":"projects/tad/existing-tools/checklists/ibm_360_research_toolkit_checklist/#performance-efficiency","title":"Performance Efficiency","text":"Requirement Priority Fulfilled Comments The tool operates efficiently and minimize resource utilization. M 1 very lightweight as a python package The tool responds to user actions instantly. M 1 It will return output instantly The tool is scalable to accommodate increased user base and data volume. S 1 This would be installed distributed and therefore would be scalable, with large datasets it is still very quicktotal_score = 11
"},{"location":"projects/tad/existing-tools/checklists/ibm_360_research_toolkit_checklist/#maintainability","title":"Maintainability","text":"Requirement Priority Fulfilled Comments The tool is easy to modify and maintain. M 1 The repositories are very well structured and therefore easy to adjust The tool adheres to industry coding standards and best practices to ensure code quality and maintainability. M 1 Although it doesn't have pre-commit hooks it does have a CONTRIBUTING.rst where the rules of good practices are written down The code is written in a common, widely adopted and supported and actively used and maintained programming language. M 1 It is written in Python The project provides version control for code changes and rollback capabilities. M 1 The code is hosted on Github The project is open source. M 1 At the beginning of this doc you can find the links to the repositories It is possible to contribute to the source. S 1 They have merged many outside requests, so this is fine The system is modular, allowing for easy modification of individual components. S 1 Tests can very easily be added if you understand Python Diagnostic tools are available to identify and troubleshoot issues. S 1 Just standard python troubleshooting toolstotal_score = 29
"},{"location":"projects/tad/existing-tools/checklists/ibm_360_research_toolkit_checklist/#security","title":"Security","text":"Requirement Priority Fulfilled Comments The tool must protect data and system from unauthorized access, use, disclosure, disruption, modification, or destruction. M 0 not applicable Regular security audits and penetration testing are conducted. S 0 It is not stated on the repository that they do something with security The tool enforce authorization controls based on user roles and permissions, restricting access to sensitive data and functionalities. C 0 The tool does not have Users or Access control Data encryption is used for sensitive information at rest and in transit. C 0 Transitionary data is not stored The project allows for regular security audits and penetration testing to identify vulnerabilities and ensure system integrity. C 1 This is not blocked by the tool The tool implements backup functionality to ensure data availability in case of incidents. C 0 Not supportedtotal_score = 2
"},{"location":"projects/tad/existing-tools/checklists/ibm_360_research_toolkit_checklist/#compatibility","title":"Compatibility","text":"Requirement Priority Fulfilled Comments The tool is compatible with existing systems and infrastructure. M 1 It can easily be imported in Python or R The tool supports industry-standard data formats and protocols. M 0.5 It does not standardize really on any output from the tests The tool operates seamlessly on supported operating systems and hardware platforms. S 1 As a python and R tool it can be run on systems where these can be ran The tool supports commonly used data formats (e.g., CSV, Excel, JSON) for easy data exchange with other systems and tools. S 1 These can be used if they are imported in python and R The tool integrates with existing security solutions. C 1 The Adversarial Robustness Toolbox can be used to test for the security of AI Systemstotal_score = 14
"},{"location":"projects/tad/existing-tools/checklists/ibm_360_research_toolkit_checklist/#accessibility","title":"Accessibility","text":"Requirement Priority Fulfilled Comments The tool is accessible to users with disabilities, following relevant accessibility standards (e.g., WCAG). S 0 You need to be a programmer to use it, and that is not your typical user with disabilitiestotal_score = 0
"},{"location":"projects/tad/existing-tools/checklists/ibm_360_research_toolkit_checklist/#portability","title":"Portability","text":"Requirement Priority Fulfilled Comments The tool support a range of operating systems (e.g., Windows, macOS, Linux) commonly used within an organization. S 0.7 If you can run python, which is not always possible within the government for example, but R could be more easy to be run on places The tool minimizes dependencies on specific hardware or software configurations, promoting flexibility. S 1 Just a python tool, no UI which is fairly minimal The tool offers a cloud-based deployment option or be compatible with cloud environments for scalability and accessibility. S 0 It is not offered as a cloud-based option The tool adheres to relevant cloud security standards and best practices. S 0 Not relevanttotal_score = 5.1
"},{"location":"projects/tad/existing-tools/checklists/ibm_360_research_toolkit_checklist/#deployment","title":"Deployment","text":"Requirement Priority Fulfilled Comments The tool has an easy and user-friendly installation and configuration process. S 0.4 You need to have some developer knowledge and also knowledge about the technical tests to use. But then it is quite easy and works fairly quickly The tool has on-premise or cloud-based deployment options to cater to different organizational needs and infrastructure. S 0 Not applicabletotal_score = 1.2
"},{"location":"projects/tad/existing-tools/checklists/ibm_360_research_toolkit_checklist/#legal-compliance","title":"Legal & Compliance","text":"Requirement Priority Fulfilled Comments It is clear how the tool is funded to avoid improper influence due to conflicts of interest M 1 The tool was from IBM, but slowly they are removing the IBM branding from this and the tool is now owned by the LF AI Foundation (where big companies are part of) The tool is compliant with relevant legal and regulatory requirements. S 1 All three tools have apache 2.0 license The tool adheres to (local) data privacy regulations like GDPR, ensuring the protection of user data. S 1 Data will stay local The tool implements appropriate security measures to comply with industry regulations and standards. S 0 Nothing is known about the security measures of the toolkits The tool is licensed for use within the organization according to the terms and conditions of the license agreement. S 1 All three tools have apache 2.0 license The tool respects intellectual property rights and avoid copyright infringement issues. S 1 The specific tests are implementations of papers which are open for everyonetotal_score = 16
"},{"location":"projects/tad/existing-tools/checklists/verifyml_checklist/","title":"VerifyML","text":"See the introduction, the maker also suggests to use an front-end tool to collaboratively change the model card. Model Card Editor this is not open-source and also the developer suggests in this issue to not use this tool but to use tools like AIVerify. This checklist only looks at the verifyML python toolkit and not the web interface.
"},{"location":"projects/tad/existing-tools/checklists/verifyml_checklist/#functionality","title":"Functionality","text":"Requirement Priority Fulfilled Comments The tool allows users to conduct technical tests on algorithms or models, including assessments of performance, bias, and fairness. To facilitate these tests, users can input relevant datasets, M 1 The tool does allow a few standardized tests, specified here The tool allows users to choose which tests to perform. M 1 In code the user is free to choose any test The tool allows users to fill out questionnaires to conduct impact assessments for AI. For example IAMA or ALTAI. M 0 The tool can generate a human readable report. M 1 The tool can visualize model cards that are generated by it The tools works with a standardized report format, that it can read, write, and update. M 1 It generates html which can be imported by a machine The tool supports plugin functionality so additional tests can be added easily. S 1 Any test can be ran by the user itself and the output imported in the model card generated by the tool The tool allows to create custom reports based on components. S 0 It doesn't offer any standardization in what to put in the report It is possible to add custom components for reports. S 1 Anything can be put in the model card, which makes it very flexible The tool provides detailed logging, including tracking of different model versions, changes in impact assessments, and technical test results for individual runs. S 0 Not ouf of the box, but this could be written in code by the owner of the algorithm The tool supports saving progress. S 1 Once the modelcard is generated it could be loaded in again and be changed The tool can be used on an isolated system without an internet connection. S 1 Once the tool is imported in python it can be used without an internet connection The tool offers options to discuss and document conversations. For example, to converse about technical tests or to collaborate on impact assessments. C 0 Assessments are not supported The tool operates with complete data privacy; it does not share any data or logging information. C 1 It does not do this The tool allows extension of report formats functionality. C 1 As it exports html, it can also be transferred to json or markdown The tool can be integrated in a CI/CD flow. C 1 The automated tests could be ran in the CI/CD tool to generated a model card The tool can be offered as a (cloud) service where no local installation is required. C 0 The python tool itself not, but a frontend which needs to be developed yes It is possible to define and automate workflows for repetitive tasks. C 1 As it is written in python this can be automated easily The tool offers pre-built connectors or low-code/no-code integration options to simplify the integration process. C 0 The tool does this nottotal_score = 42
"},{"location":"projects/tad/existing-tools/checklists/verifyml_checklist/#reliability","title":"Reliability","text":"Requirement Priority Fulfilled Comments The tool operates consistently and reliably, meaning it delivers the same expected results every time you use it. M 1 Once you have located the right (older) libraries it runs pretty smoothly and reliably The tool recovers automatically from common failures. S 0 Library dependencies needs to be solved by yourself as this is not handled by the tool (especially graphs) The tool recovers from failures quickly, minimizing data loss, for example by automatically saving intermediate test progress results. S 0 It does not store any intermediary results The tool handles errors gracefully and informs users of any issues. S 0 It just breaks, you need to explicitly export the model card for it to saved The tool provides clear error messages and instructions for troubleshooting. S 0 The error messages are python error messages unrelated to the tooltotal_score = 4
"},{"location":"projects/tad/existing-tools/checklists/verifyml_checklist/#usability","title":"Usability","text":"Requirement Priority Fulfilled Comments The tool possess a clean, intuitive, and visually appealing UI that follows industry standards. S 0 There is no user interface The tool provides clear and consistent navigation, making it easy for users to find what they need. S 0 There is no user interface The tool is responsive and provides instant feedback. S 0 There is no user interface The user interface is multilingual and supports at least English. S 0 There is no user interface The tool offers keyboard shortcuts for efficient interaction. C 0 There is no user interface The user interface can easily be translated into other languages. C 0 There is no user interfacetotal_score = 0
"},{"location":"projects/tad/existing-tools/checklists/verifyml_checklist/#help-documentation","title":"Help & Documentation","text":"Requirement Priority Fulfilled Comments The tool provides comprehensive online help documentation with searchable functionalities. S 0.5 The documentation is quite concise and helpful, but it is outdated The tool offers context-sensitive help within the application. C 0 No context info whatsoever The online documentation includes video tutorials and training materials for ease of learning. C 0 Just documentation The project provides readily available customer support through various channels (e.g., email, phone, online chat) to address user inquiries and troubleshoot issues. C 0 The people who worked on the tool are quick to respond to issues, but they don't support the tool anymoretotal_score = 1.5
"},{"location":"projects/tad/existing-tools/checklists/verifyml_checklist/#performance-efficiency","title":"Performance Efficiency","text":"Requirement Priority Fulfilled Comments The tool operates efficiently and minimize resource utilization. M 1 Very lightweight tool, as it is a python package The tool responds to user actions instantly. M 1 When run, it returns instantly The tool is scalable to accommodate increased user base and data volume. S 1 This would be installed distributed and therefore would be scalable, with large datasets it is still very quicktotal_score = 11
"},{"location":"projects/tad/existing-tools/checklists/verifyml_checklist/#maintainability","title":"Maintainability","text":"Requirement Priority Fulfilled Comments The tool is easy to modify and maintain. M 1 The tool itself it not so large and written with tools we are all quite aware of The tool adheres to industry coding standards and best practices to ensure code quality and maintainability. M 1 The repository has poetry, pre-commit hooks, has a CI, and looks well structured The code is written in a common, widely adopted and supported and actively used and maintained programming language. M 1 in Python and jupyter notebooks The project provides version control for code changes and rollback capabilities. M 1 It is hosted on Github The project is open source. M 1 Apache 2.0 license It is possible to contribute to the source. S 0 The project is not active supported anymore, so we would need to make a fork and make that the main source The system is modular, allowing for easy modification of individual components. S 0.5 The idea of a model card is pretty modular, and can be changed any way we like. Adding assessments in the tool would be quite the effort Diagnostic tools are available to identify and troubleshoot issues. S 1 Just standard python troubleshooting toolstotal_score = 24.5
"},{"location":"projects/tad/existing-tools/checklists/verifyml_checklist/#security","title":"Security","text":"Requirement Priority Fulfilled Comments The tool must protect data and system from unauthorized access, use, disclosure, disruption, modification, or destruction. M 0 not applicable Regular security audits and penetration testing are conducted. S 0 As the tool is not actively maintained anymore The tool enforce authorization controls based on user roles and permissions, restricting access to sensitive data and functionalities. C 0 As this is a local import only, this is managed by the developer Data encryption is used for sensitive information at rest and in transit. C 0 Intermediary data is not stored, and the end result is put in html with no encryption The project allows for regular security audits and penetration testing to identify vulnerabilities and ensure system integrity. C 1 It does not block this for users to do this The tool implements backup functionality to ensure data availability in case of incidents. C 0 Not supportedtotal_score = 2
"},{"location":"projects/tad/existing-tools/checklists/verifyml_checklist/#compatibility","title":"Compatibility","text":"Requirement Priority Fulfilled Comments The tool is compatible with existing systems and infrastructure. M 1 It can be easily imported and installed in python The tool supports industry-standard data formats and protocols. M 1 Standardized tests are used and the output format is html The tool operates seamlessly on supported operating systems and hardware platforms. S 1 As it is a python tool, anywhere where python can run this can also be run The tool supports commonly used data formats (e.g., CSV, Excel, JSON) for easy data exchange with other systems and tools. S 1 This can be imported The tool integrates with existing security solutions. C 0 It does not do such a thingtotal_score = 14
"},{"location":"projects/tad/existing-tools/checklists/verifyml_checklist/#accessibility","title":"Accessibility","text":"Requirement Priority Fulfilled Comments The tool is accessible to users with disabilities, following relevant accessibility standards (e.g., WCAG). S 0 You need to be a programmer to use it, and that is not your typical user with disabilitiestotal_score = 0
"},{"location":"projects/tad/existing-tools/checklists/verifyml_checklist/#portability","title":"Portability","text":"Requirement Priority Fulfilled Comments The tool support a range of operating systems (e.g., Windows, macOS, Linux) commonly used within an organization. S 0.5 If you can run python, which is not always possible within the government for example The tool minimizes dependencies on specific hardware or software configurations, promoting flexibility. S 1 As it is a python tool The tool offers a cloud-based deployment option or be compatible with cloud environments for scalability and accessibility. S 0 It is not offered as a cloud-based option The tool adheres to relevant cloud security standards and best practices. S 0 On the github nothing is mentioned about security and for the cloud version it is not applicabletotal_score = 4.5
"},{"location":"projects/tad/existing-tools/checklists/verifyml_checklist/#deployment","title":"Deployment","text":"Requirement Priority Fulfilled Comments The tool has an easy and user-friendly installation and configuration process. S 0.2 You need to have some developer knowledge and also knowledge about the technical tests to use The tool has on-premise or cloud-based deployment options to cater to different organizational needs and infrastructure. S 0 Not applicabletotal_score = 0.6
"},{"location":"projects/tad/existing-tools/checklists/verifyml_checklist/#legal-compliance","title":"Legal & Compliance","text":"Requirement Priority Fulfilled Comments It is clear how the tool is funded to avoid improper influence due to conflicts of interest M 1 It was developed during a competition and it does not receive funding anymore The tool is compliant with relevant legal and regulatory requirements. S 1 Under the apache 2.0 license The tool adheres to (local) data privacy regulations like GDPR, ensuring the protection of user data. S 1 Data will stay local The tool implements appropriate security measures to comply with industry regulations and standards. S 0 The repo does not speak about security at all The tool is licensed for use within the organization according to the terms and conditions of the license agreement. S 1 Under the apache 2.0 license The tool respects intellectual property rights and avoid copyright infringement issues. S 1total_score = 16
"},{"location":"projects/tad/existing-tools/comparison/requirements/","title":"Requirements for tools for Transparency of Algorithmic Decision making","text":"This document contains a checklist with requirements for tools we could use to help with the transparency of algorithmic decision making.
The requirements are based on:
The requirements have been given a priority based on the MoSCoW scale to allow for tool comparison.
"},{"location":"projects/tad/existing-tools/comparison/requirements/#functionality","title":"Functionality","text":"Requirement Priority The tool allows users to conduct technical tests on algorithms or models, including assessments of performance, bias, and fairness. To facilitate these tests, users can input relevant datasets, M The tool allows users to choose which tests to perform. M The tool allows users to fill out questionnaires to conduct impact assessments for AI. For example IAMA or ALTAI. M The tool can generate a human readable report. M The tools works with a standardized report format, that it can read, write, and update. M The tool supports plugin functionality so additional tests can be added easily. S The tool allows to create custom reports based on components. S It is possible to add custom components for reports. S The tool provides detailed logging, including tracking of different model versions, changes in impact assessments, and technical test results for individual runs. S The tool supports saving progress. S The tool can be used on an isolated system without an internet connection. S The tool offers options to discuss and document conversations. For example, to converse about technical tests or to collaborate on impact assessments. C The tool operates with complete data privacy; it does not share any data or logging information. C The tool allows extension of report formats functionality. C The tool can be integrated in a CI/CD flow. C The tool can be offered as a (cloud) service where no local installation is required. C It is possible to define and automate workflows for repetitive tasks. C The tool offers pre-built connectors or low-code/no-code integration options to simplify the integration process. C"},{"location":"projects/tad/existing-tools/comparison/requirements/#reliability","title":"Reliability","text":"Requirement Priority The tool operates consistently and reliably, meaning it delivers the same expected results every time you use it. M The tool recovers automatically from common failures. S The tool recovers from failures quickly, minimizing data loss, for example by automatically saving intermediate test progress results. S The tool handles errors gracefully and informs users of any issues. S The tool provides clear error messages and instructions for troubleshooting. S"},{"location":"projects/tad/existing-tools/comparison/requirements/#usability","title":"Usability","text":"Requirement Priority The tool possess a clean, intuitive, and visually appealing UI that follows industry standards. S The tool provides clear and consistent navigation, making it easy for users to find what they need. S The tool is responsive and provides instant feedback. S The user interface is multilingual and supports at least English. S The tool offers keyboard shortcuts for efficient interaction. C The user interface can easily be translated into other languages. C"},{"location":"projects/tad/existing-tools/comparison/requirements/#help-documentation","title":"Help & Documentation","text":"Requirement Priority The tool provides comprehensive online help documentation with searchable functionalities. S The tool offers context-sensitive help within the application. C The online documentation includes video tutorials and training materials for ease of learning. C The project provides readily available customer support through various channels (e.g., email, phone, online chat) to address user inquiries and troubleshoot issues. C"},{"location":"projects/tad/existing-tools/comparison/requirements/#performance-efficiency","title":"Performance Efficiency","text":"Requirement Priority The tool operates efficiently and minimize resource utilization. M The tool responds to user actions instantly. M The tool is scalable to accommodate increased user base and data volume. S"},{"location":"projects/tad/existing-tools/comparison/requirements/#maintainability","title":"Maintainability","text":"Requirement Priority The tool is easy to modify and maintain. M The tool adheres to industry coding standards and best practices to ensure code quality and maintainability. M The code is written in a common, widely adopted and supported and actively used and maintained programming language. M The project provides version control for code changes and rollback capabilities. M The project is open source. M It is possible to contribute to the source. S The system is modular, allowing for easy modification of individual components. S Diagnostic tools are available to identify and troubleshoot issues. S"},{"location":"projects/tad/existing-tools/comparison/requirements/#security","title":"Security","text":"Requirement Priority The tool must protect data and system from unauthorized access, use, disclosure, disruption, modification, or destruction. M Regular security audits and penetration testing are conducted. S The tool enforce authorization controls based on user roles and permissions, restricting access to sensitive data and functionalities. C Data encryption is used for sensitive information at rest and in transit. C The project allows for regular security audits and penetration testing to identify vulnerabilities and ensure system integrity. C The tool implements backup functionality to ensure data availability in case of incidents. C"},{"location":"projects/tad/existing-tools/comparison/requirements/#compatibility","title":"Compatibility","text":"Requirement Priority The tool is compatible with existing systems and infrastructure. M The tool supports industry-standard data formats and protocols. M The tool operates seamlessly on supported operating systems and hardware platforms. S The tool supports commonly used data formats (e.g., CSV, Excel, JSON) for easy data exchange with other systems and tools. S The tool integrates with existing security solutions. C"},{"location":"projects/tad/existing-tools/comparison/requirements/#accessibility","title":"Accessibility","text":"Requirement Priority The tool is accessible to users with disabilities, following relevant accessibility standards (e.g., WCAG). S"},{"location":"projects/tad/existing-tools/comparison/requirements/#portability","title":"Portability","text":"Requirement Priority The tool support a range of operating systems (e.g., Windows, macOS, Linux) commonly used within an organization. S The tool minimizes dependencies on specific hardware or software configurations, promoting flexibility. S The tool offers a cloud-based deployment option or be compatible with cloud environments for scalability and accessibility. S The tool adheres to relevant cloud security standards and best practices. S"},{"location":"projects/tad/existing-tools/comparison/requirements/#deployment","title":"Deployment","text":"Requirement Priority The tool has an easy and user-friendly installation and configuration process. S The tool has on-premise or cloud-based deployment options to cater to different organizational needs and infrastructure. S"},{"location":"projects/tad/existing-tools/comparison/requirements/#legal-compliance","title":"Legal & Compliance","text":"Requirement Priority It is clear how the tool is funded to avoid improper influence due to conflicts of interest M The tool is compliant with relevant legal and regulatory requirements. S The tool adheres to (local) data privacy regulations like GDPR, ensuring the protection of user data. S The tool implements appropriate security measures to comply with industry regulations and standards. S The tool is licensed for use within the organization according to the terms and conditions of the license agreement. S The tool respects intellectual property rights and avoid copyright infringement issues. S"},{"location":"projects/tad/existing-tools/comparison/tools/","title":"Research of tools for Transparency of Algorithmic Decision making","text":"In our ongoing research on AI validation and transparency, we are seeking tools to support assessments. Ideal tools would combine various technical tests with checklists and questionnaires and have the ability to generate reports in both human-friendly and machine-exchangeable formats.
This document contains a list of tools we have found and may want to investigate further.
"},{"location":"projects/tad/existing-tools/comparison/tools/#ai-verify","title":"AI Verify","text":"AI Verify is an AI governance testing framework and software toolkit that validates the performance of AI systems against a set of internationally recognized principles through standardized tests, and is consistent with international AI governance frameworks such as those from European Union, OECD and Singapore.
Links: AI Verify Homepage, AI Verify documentation, AI Verify Github.
"},{"location":"projects/tad/existing-tools/comparison/tools/#to-investigate-further","title":"To investigate further","text":""},{"location":"projects/tad/existing-tools/comparison/tools/#verifyml","title":"VerifyML","text":"What is it? VerifyML is an opinionated, open-source toolkit and workflow to help companies implement human-centric AI practices. It seems pretty much equivalent to AI Verify.
Why interesting? The functionality of this toolkit seems to match closely with those of AI Verify. It has a \"git and code first approach\" and has automatic generation of model cards.
Remarks The code seems to be last updated 2 years ago.
Links: VerifyML, VerifyML GitHub
"},{"location":"projects/tad/existing-tools/comparison/tools/#ibm-research-360-toolkit","title":"IBM Research 360 Toolkit","text":"What is it? Open source Python libraries that supports interpretability and explainability of datasets and machine learning models. Most relevant toolkits are the AI Fairness 360 and AI Explainability 360.
Why interesting? Seems to encompass extensive fairness and explainability tests. Codebase seems to be active.
Remarks It comes as Python and R libraries.
Links: AI Fairness 360 Github, AI Explainability 360 Github.
"},{"location":"projects/tad/existing-tools/comparison/tools/#holistic-ai","title":"Holistic AI","text":"What is it? Open source tool to assess and improve the trustworthiness of AI systems. Offers tools to measure and mitigate bias across numerous tasks. Will be extended to include tools for efficacy, robustness, privacy and explainability.
Why interesting? Although it is not entirely clear what exactly this tool does (see Remarks) it does seem (according to their website) to provide reports on bias and fairness. The Github rep does not seem to include any report generating code, but mainly technical tests. Here is an example in which bias is measured in a classification model.
Remarks Website seems to suggest the possibility to generate reports, but this is not directly reflected in the codebase. Possibly reports are only available with some sort of licensed product?
Links: Holistic AI Homepage, Holistic AI Github.
"},{"location":"projects/tad/existing-tools/comparison/tools/#ai-assessment-tool","title":"AI Assessment Tool","text":"What is it? The tool is based on the ALTAI published by the European Commission. It is more of a discussion tool about AI Systems.
Why interesting? Although it only includes questionnaires it does give an interesting way of reporting the end results. Discussions on for example IAMA can be documented as well within the tool.
Remarks The tool of the EU itself is not open-source but the tool from Belgium is. Does not include any technical tests at this point.
Links: AI Assessment Tool Belgium homepage AI Assessment Tool Belgium Github
"},{"location":"projects/tad/existing-tools/comparison/tools/#interesting-to-mention","title":"Interesting to mention","text":"What-if. Provides interface for expanding understanding of a black-box classification or regression ML model. Can be accessed through TensorBoard or as an extension in a Jupyter or Colab notebook. Does not seem to be an active codebase.
Aequitas. Open source bias auditing and Fair ML toolkit. This already seems to be contained within AI Verify, at least the 'fairness tree'.
Facets. Open source toolkit for understanding and analyzing ML datasets. Note that does not include ML models.
Fairness Indicators. Open source Python package which enables easy computation of commonly-identified fairness metrics for binary and multiclass classifiers. Part of TensorFlow. k
Fairlearn. Open source Python package that empowers developers of AI systems to assess their system's fairness and mitigate any observed unfairness issues.
Dalex. The DALEX package x-rays any model and helps to explore and explain its behavior, helps to understand how complex models are working. The main function explain() creates a wrapper around a predictive model. Wrapped models may then be explored and compared with a collection of local and global explainers. Recent developments from the area of Interpretable Machine Learning/eXplainable Artificial Intelligence.
SigmaRed. SigmaRed platform enables comprehensive third-party AI risk management (AI TPRM) and rapidly reduces the cycle time of conducting AI risks assessments while providing deep visibility, control, stakeholder based reporting, and detailed evidence repository. Does not seem to be open source.
Anch.ai. The end-to-end cloud solution empowers global data-driven organizations to govern and deploy responsible, transparent, and explainable AI aligned with upcoming EU regulation AI Act. Does not seem to be open source.
CredoAI. Credo AI is an AI governance platform that helps companies adopt, scale, and govern AI safely and effectively. Does not seem to be open source.
Paper by TNO about the FATE system. Acronym stands for \"FAir, Transparent and Explainable Decision Making.\"
Tools mentioned include some of the above: Aequitas, AI Fairness 360, Dalex, Fairlearn, Responsibly, and What-If-Tool
Links: Paper, Article, Microsoft links.
"},{"location":"projects/tad/existing-tools/comparison/tools_comparison/","title":"Comparison of tools for transparency of algorithmic decision making","text":"We have researched a few tools which we want to investigate further, this document is the next step in that investigation. We created a checklist to compare these tools against. The Fulfilled column will give a numerical value based on whether that requirement is fulfilled or not between 0 and 1. Then the actual scoring is the fulfilled value times the priority (the priority is translated to numerical values in the following way: {M:4, S:3, C:2, W:-1}).
"},{"location":"projects/tad/existing-tools/comparison/tools_comparison/#summary-of-the-comparison","title":"Summary of the comparison","text":"Requirement AIVerify VerifyML IBM 360 Research Toolkit Holistic AI AI Assessment Tool Functionality 36 42 20 17 22.85 Reliability 13 4 16 16 15.4 Usability 9.4 0 0 0 13 Help & Documentation 2.8 1.5 6.4 1.6 0.55 Performance Efficiency 7.5 11 11 11 11 Maintainability 15.8 24.5 29 23.5 25.6 Security 8.3 2 2 2 7.5 Compatibility 12.5 14 14 10 11 Accessibility 0 0 0 0 0.3 Portability 10.5 4.5 5.1 7.5 11.4 Deployment 1.5 0.6 1.2 3.6 3 Legal & Compliance 19 16 16 16 19 Total 136.3 120.1 120.7 108.2 140.6"},{"location":"projects/tad/existing-tools/comparison/tools_comparison/#notable-differences-between-the-tools","title":"Notable differences between the tools","text":"AIVerify notes:
Technical tests are supported, but it can be quite slow because of overhead of the tool
More flexibility would need to be built in before people could use the technical tests
If you have many variables you are not able to show it in the pdf
The error messages in why technical tests don't work on the model are not user-friendly
VerifyML notes:
This tool is not actively developed anymore, parties transferred their focus to AIVerify
This tool does not support for assessments
IBM 360 toolkit notes:
The toolkit has a strong backing of the industry and the community
There are many technical tests included from the latest research, and also supports mitigation algorithms
It is purely for developers and has therefore no support for assessments
Holistic AI:
Like IBM 360 Toolkit it does differentiate to different type of technical assessments like bias and explainability, but it is less extensive than the 360 toolkit
The ambition is large of Holistic AI, they want to capture, Efficacy, Robustness, and Privacy tests as well
It is a private company from the United Kingdom which has open sourced part of their tool
AI Assessment Tool:
This tool does not have any technical tests, but outshines the others with the discussion on assessment option
It is also very performant
AIVerify
is a tool with a UI to execute both assessments and technical tests.
VerifyML
is a Python package to generate Model Cards.
Holistic AI
is a Python package to test for and mitigate Bias in your model.
IBM 360 Research Toolkit
is a Python and R package to test for Fairness & Explainability of your model.
AI Assessment Tool
is a tool with a UI to execute assessments and log discussions.
This document describes the Transparency of Algorithmic Decision making (TAD) Reporting Standard.
For reproducibility, governance, auditing and sharing of algorithmic systems it is essential to have a reporting standard so that information about an algorithmic system can be shared. This reporting standard describes how information about the different phases of an algorithm's life cycle can be reported. It contains, among other things, descriptive information combined with information about the technical tests and assessments applied.
Disclaimer
The TAD Reporting Standard is work in progress. This means that the current standard is probably suboptimal and will change significantly in future versions.
"},{"location":"projects/tad/reporting-standard/#introduction","title":"Introduction","text":"Inspired by Model Cards for Model Reporting and Papers with Code Model Index this standard almost1 2 3 4 extends the Hugging Face model card metadata specification to allow for:
metrics_field
from the Hugging Face metadata specification.measurements
.assessments
.Following Hugging Face, this proposed standard will be written in YAML.
This standard does not contain all fields present in the Hugging Face metadata specification. The fields that are optional in the Hugging Face specification and are specific to the Hugging Face interface are omitted.
Another difference is that we divide our implementation into three separate parts.
system_card
, containing information about a group of ML-models which accomplish a specific task.model_card
, containing information about a specific data science model.assessment_card
, containing information about a regulatory assessment.Include statements
These model_card
s and assessment_card
s can be included verbatim into a system_card
, or referenced with an !include
statement, allowing for minimal cards to be compact in a single file. Extensive cards can be split up for readability and maintainability. Our standard allows for the !include
to be used anywhere.
The standard will be written in YAML. Example YAML files are given in the next section. The standard defines three cards: a system_card
, a model_card
and an assessment_card
. A system_card
contains information about an algorithmic system. It can have multiple models and each of these models should have a model_card
. Regulatory assessments can be processed in an assessment_card
. Note that model_card
's and assessment_card
's can be included directly into the system_card
or can be included as separate YAML files with help of a YAML-include mechanism. For clarity the latter is preferred and is also used in the examples in the next section.
system_card
","text":"A system_card
contains the following information.
schema_version
(REQUIRED, string). Version of the schema used, for example \"0.1a2\".provenance
(OPTIONAL). In case this System Card is generated from another source file, this field can capture the historical context of the contents of this System Card.
git_commit_hash
(OPTIONAL, string). Git commit hash of the commit which contains the transformation file used to create this card.timestamp
(OPTIONAL, string). A timestamp of the date, time and timezone of generation of this System Card. Timestamp should be given, preferably in UTC (represented as Z
), in ISO 8601 format, i.e. 2024-04-16T16:48:14Z
.uri
(OPTIONAL, string). URI to the tool that was used to perform the transformations.author
(OPTIONAL, string). Name of person that initiated the transformations.name
(OPTIONAL, string). Name used to describe the system.
upl
(OPTIONAL, string). If this algorithm is part of a product offered by the Dutch Government, it should contain a URI from the Uniform Product List.owners
(OPTIONAL, list). There can be multiple owners. For each owner the following fields are present.
oin
(OPTIONAL, string). If applicable the Organisatie-identificatienummer (OIN).organization
(OPTIONAL, string). Name of the organization that owns the model. If ion
is NOT provided this field is REQUIRED.name
(OPTIONAL, string). Name of a contact person within the organization.email
(OPTIONAL, string). Email address of the contact person or organization.role
(OPTIONAL, string). Role of the contact person. This field should only be set when the name
field is set.description
(OPTIONAL, string). A short description of the system.
labels
(OPTIONAL, list). This fields allows to store meta information about a system. There can be multiple labels. For each label the following fields are present.
name
(OPTIONAL, string). Name of the label.value
(OPTIONAL, string). Value of the label.status
(OPTIONAL, string). The status of the system. For example the status can be \"production\".
publication_category
(OPTIONAL, enum[string]). The publication category of the algorithm should be chosen from [\"high_risk\", other\"]
.begin_date
(OPTIONAL, string). The first date the system was used. Date should be given in ISO 8601 format, i.e. YYYY-MM-DD
.end_date
(OPTIONAL, string). The last date the system was used. Date should be given in ISO 8601 format, i.e. YYYY-MM-DD
.goal_and_impact
(OPTIONAL, string). The purpose of the system and the impact it has on citizens and companies.considerations
(OPTIONAL, string). The pro's and con's of using the system.risk_management
(OPTIONAL, string). Description of the risks associated with the system.human_intervention
(OPTIONAL, string). A description to want extend there is human involvement in the system.legal_base
(OPTIONAL, list). If there exists a legal base for the process the system is embedded in, this field can be filled in with the relevant laws. There can be multiple legal bases. For each legal base the following fields are present.
name
(OPTIONAL, string). Name of the law.link
(OPTIONAL, string). URI pointing towards the contents of the law.used_data
(OPTIONAL, string). An overview of the data that is used in the system.
technical_design
(OPTIONAL, string). Description on how the system works.external_providers
(OPTIONAL, list). If relevant, these fields allow to store information on external providers. There can be multiple external providers.
name
(OPTIONAL, string). Name of the external provider.version
(OPTIONAL, string). Version of the external provider reflecting its relation to previous versions.references
(OPTIONAL, list[string]). Additional reference URI's that point information about the system and are relevant.
interaction_details
(OPTIONAL, list[string]). Explain how the AI system interacts with hardware or software, including other AI systems, or how the AI system can be used to interact with hardware or software.version_requirements
(OPTIONAL, list[string]). Describe the versions of the relevant software or firmware, and any requirements related to version updates.deployment_variants
(OPTIONAL, list[string]). Description of all the forms in which the AI system is placed on the market or put into service, such as software packages embedded into hardware, downloads, or APIs.hardware_requirements
(OPTIONAL, list[string]). Provide a description of the hardware on which the AI system must be run.product_markings
(OPTIONAL, list[string]). If the AI system is a component of products, photos, or illustrations, describe the external features, markings, and internal layout of those products.user_interface
(OPTIONAL, list). Provide information on the user interface provided to the user responsible for its operation.
description
(OPTIONAL, string). A description of the provided user interface.link
(OPTIONAL, string). A link to the user interface can be included.snapshot
(OPTIONAL, string). A snapshot/screenshot of the user interface can be included with the use of a hyperlink.models
(OPTIONAL, list[ModelCard]). A list of model cards (as defined below) or !include
s of a YAML file containing a model card. This model card can for example be a model card described in the next section or a model card from Hugging Face. There can be multiple model cards, meaning multiple models are used.
assessments
(OPTIONAL, list[AssessmentCard]). A list of assessment cards (as defined below) or !include
s of a YAML file containing a assessment card. This assessment card is an assessment card described in the next section. There can be multiple assessment cards, meaning multiple assessment were performed.
model_card
","text":"A model_card
contains the following information.
provenance
(OPTIONAL). In case this Model Card is generated from another source file, this field can capture the historical context of the contents of this Model Card.
git_commit_hash
(OPTIONAL, string). Git commit hash of the commit which contains the transformation file used to create this card.timestamp
(OPTIONAL, string). A timestamp of the date, time and timezone of generation of this System Card. Timestamp should be given, preferably in UTC (represented as Z
), in ISO 8601 format, i.e. 2024-04-16T16:48:14Z
.uri
(OPTIONAL, string). URI to the tool that was used to perform the transformations.author
(OPTIONAL, string). Name of person that initiated the transformations.language
(OPTIONAL, list[string]). If relevant, the natural languages the model supports in ISO 639. There can be multiple languages.
license
(REQUIRED).
license_name
(REQUIRED, string). Any license from the open source license list1. If the license is NOT present in the license list this field must be set to 'other' and the following two fields will be REQUIRED.license_link
(OPTIONAL, string). A link to a file of that name inside the repo, or a URL to a remote file containing the license contents.tags
(OPTIONAL, list[string]). Tags with keywords to describe the project. There can be multiple tags.
owners
(OPTIONAL, list). There can be multiple owners. For each owner the following fields are present.
oin
(OPTIONAL, string). If applicable the Organisatie-identificatienummer (OIN).organization
(OPTIONAL, string). Name of the organization that owns the model. If ion
is NOT provided this field is REQUIRED.name
(OPTIONAL, string). Name of a contact person within the organization.email
(OPTIONAL, string). Email address of the contact person or organization.role
(OPTIONAL, string). Role of the contact person. This field should only be set when the name
field is set.model_index
(REQUIRED, list). There can be multiple models. For each model the following fields are present.
name
(REQUIRED, string). The name of the model.model
(REQUIRED, string). A URI pointing to a repository containing the model file.artifacts
(OPTIONAL, list). A list of artifacts
uri
(OPTIONAL, string) URI refers to a relevant model artifactcontent-type
(OPTIONAL, string) Optional type, follow the Content-Type. Recognized values are \"application/onnx\"\", to refer to an ONNX representation of the model.md5-checksum
(OPTIONAL, string) Optional checksum for the content of the file.parameters
(OPTIONAL, list). There can be multiple parameters. For each parameter the following fields are present.
name
(REQUIRED, string). The name of the parameter, for example \"epochs\".dtype
(OPTIONAL, string). The datatype of the parameter, for example \"int\".value
(OPTIONAL, string). The value of the parameter, for example 100.labels
(OPTIONAL, list). This field allows to store meta information about a parameter. There can be multiple labels. For each label the following fields are present.
name
(OPTIONAL, string). The name of the label.dtype
(OPTIONAL, string). The datatype of the feature. If name
is set, this field is REQUIRED.value
(OPTIONAL, string). The value of the feature. If name
is set, this field is REQUIRED.results
(OPTIONAL, list). There can be multiple results. For each result the following fields are present.
task
(OPTIONAL, list).
task_type
(REQUIRED, string). The task of the model, for example \"object-classification\".task_name
(OPTIONAL, string). A pretty name for the model tasks, for example \"Object Classification\".datasets
(OPTIONAL, list). There can be multiple datasets 2. For each dataset the following fields are present.
type
(REQUIRED, string). The type of the dataset, can be a dataset id from Hugging Face datasets or any other link to a repository containing the dataset3, for example \"common_voice\".name
(REQUIRED, string). Name pretty name for the dataset, for example \"Common Voice (French)\".split
(OPTIONAL, string). The split of the dataset, for example \"train\".features
(OPTIONAL, list[string]). List of feature names.revision
(OPTIONAL, string). Version of the dataset, for example \"5503434ddd753f426f4b38109466949a1217c2bb\".metrics
(OPTIONAL, list). There can be multiple metrics. For each metric the following fields are present.
type
(REQUIRED, string). A metric-id from Hugging Face metrics4, for example accuracy.name
(REQUIRED, string). A descriptive name of the metric. For example \"false positive rate\" is not a descriptive name, but \"training false positive rate w.r.t class x\" is.dtype
(REQUIRED, string). The data type of the metric, for example float
.value
(REQUIRED, string). The value of the metric.labels
(OPTIONAL, list). This field allows to store meta information about a metric. For example, metrics can be computed for example on subgroups of specific features. For example, one can compute the accuracy for examples where the feature \"gender\" is set to \"male\". There can be multiple subgroups, which means that the metric is computed on the intersection of those subgroups. There can be multiple labels. For each label the following fields are present.
name
(OPTIONAL, string). The name of the feature. For example: \"gender\".type
(OPTIONAL, string). The type of the label. Can for example be set to \"feature\" or \"output_class\". If name
is set, this field is REQUIRED.dtype
(OPTIONAL, string). The datatype of the feature, for example float
. If name
is set, this field is REQUIRED.value
(OPTIONAL, string). The value of the feature. If name
is set, this field is REQUIRED. For example: \"male\".measurements
.
bar_plots
(OPTIONAL, list). The purpose of this field is to capture bar plot like measurements, for example SHAP values. There can be multiple bar plots. For each bar plot the following fields are present.
type
(REQUIRED, string). The type of bar plot, for example \"SHAP\".name
(OPTIONAL, string). A pretty name for the plot, for example \"Mean Absolute SHAP Values\".results
(REQUIRED, list). The contents of the bar plot. A result represents a bar. There can be multiple results. For each result the following fields are present.
name
(REQUIRED, string). The name of bar.value
(REQUIRED, float). The value of the corresponding bar.graph_plots
(OPTIONAL, list). The purpose of this field is to capture graph plot like measurements, such as partial dependence plots. There can be multiple graph plots. For each graph plot the following fields are present.
type
(REQUIRED, string). The type of the graph plot, for example \"partial_dependence\".name
(OPTIONAL, string). A pretty name of the graph, for example \"Partial Dependence Plot\".results
(REQUIRED, list). Results contains the graph plot data. Each graph can depend on a specific output class and feature. There can be multiple results. For each result the following fields are present.
class
(OPTIONAL, string/int/float/bool). The output class name that the graph corresponds to. This field is not always present.feature
(REQUIRED, string). The feature the graph corresponds to. This is required, since all relevant graphs are dependent on features.data
(REQUIRED, list)
x_value
(REQUIRED, float). The $x$-value of the graph.y_value
(REQUIRED, float). The $y$-value of the graph.assessment_card
","text":"An assessment_card
contains the following information.
provenance
(OPTIONAL). In case this Assessment Card is generated from another source file, this field can capture the historical context of the contents of this Assessment Card.
git_commit_hash
(OPTIONAL, string). Git commit hash of the commit which contains the transformation file used to create this card.timestamp
(OPTIONAL, string). A timestamp of the date, time and timezone of generation of this System Card. Timestamp should be given, preferably in UTC (represented as Z
), in ISO 8601 format, i.e. 2024-04-16T16:48:14Z
.uri
(OPTIONAL, string). URI to the tool that was used to perform the transformations.author
(OPTIONAL, string). Name of person that initiated the transformations.name
(REQUIRED, string). The name of the assessment.
urn
(OPTIONAL, string). A Uniform Resource Name (URN) of the instrument in the instrument register.date
(REQUIRED, string). The date at which the assessment is completed. Date should be given in ISO 8601 format, i.e. YYYY-MM-DD
.contents
(REQUIRED, list). There can be multiple items in contents. For each item the following fields are present:
question
(REQUIRED, string). A question.urn
(OPTIONAL, string). A Uniform Resource Name (URN) of the corresponding task in the instrument register.answer
(REQUIRED, string). An answer.remarks
(OPTIONAL, string). A field to put relevant discussion remarks in.authors
(OPTIONAL, list). There can be multiple names. For each name the following field is present.
name
(OPTIONAL, string). The name of the author of the question.timestamp
(OPTIONAL, string). A timestamp of the date, time and timezone of the answer. Timestamp should be given, preferably in UTC (represented as Z
), in ISO 8601 format, i.e. 2024-04-16T16:48:14Z
.
version: {system_card_version}\nprovenance:\n git_commit_hash: {git_commit_hash}\n timestamp: {modification_timestamp}\n uri: {modification_uri}\n author: {modification_author}\nname: {system_name}\nupl: {upl_uri}\nowners:\n - oin: {oin}\n organization: {organization_name}\n name: {owner_name}\n email: {owner_email}\n role: {owner_role}\ndescription: {system_description}\nlabels:\n - name: {label_name}\n value: {label_value}\nstatus: {system_status}\npublication_category: {system_publication_cat}\nbegin_date: {system_begin_date}\nend_date: {system_end_date}\ngoal_and_impact: {system_goal_and_impact}\nconsiderations: {system_considerations}\nrisk_management: {system_risk_management}\nhuman_intervention: {system_human_intervention}\nlegal_base:\n - name: {law_name}\n link: {law_uri}\nused_data: {system_used_data}\ntechnical_design: {technical_design}\nexternal_providers:\n - name: {name_external_provider}\n version: {version_external_provider}\nreferences:\n - {reference_uri}\ninteraction_details:\n - {system_interaction_details}\nversion_requirements:\n - {system_version_requirements}\ndeployment_variants:\n - {system_deployment_variants}\nhardware_requirements:\n - {system_hardware_requirements}\nproduct_markings:\n - {system_product_markings}\nuser_interface:\n - description: {system_user_interface}\n link: {system_user_interface_uri}\n snapshot: {system_user_interface_snapshot_uri}\n\nmodels:\n - !include {model_card_uri}\n\nassessments:\n - !include {assessment_card_uri}\n
"},{"location":"projects/tad/reporting-standard/#model-card","title":"Model Card","text":"provenance:\n git_commit_hash: {git_commit_hash}\n timestamp: {modification_timestamp}\n uri: {modification_uri}\n author: {modification_author}\nlanguage:\n - {lang_0}\nlicense:\n license_name: {license_name}\n license_link: {license_uri}\ntags:\n - {tag_0}\nowners:\n - oin: {oin}\n organization: {organization_name}\n name: {owner_name}\n email: {owner_email}\n role: {owner_role}\n\nmodel-index:\n - name: {model_id}\n model: {model_uri}\n artifacts:\n - uri: {model_artifact_uri}\n - content-type: {model_artifact_type}\n - md5-checksum: {md5_checksum}\n parameters:\n - name: {parameter_name}\n dtype: {parameter_dtype}\n value: {parameter_value}\n labels:\n - name: {label_name}\n dtype: {label_type}\n value: {label_value}\n results:\n - task:\n - type: {task_type}\n name: {task_name}\n datasets:\n - type: {dataset_type}\n name: {dataset_name}\n split: {split}\n features:\n - {feature_name}\n revision: {dataset_version}\n metrics:\n - type: {metric_type}\n name: {metric_name}\n dtype: {metric_dtype}\n value: {metric_value}\n labels:\n - name: {label_name}\n type: {label_type}\n dtype: {label_type}\n value: {label_value}\n measurements:\n bar_plots:\n - type: {measurement_type}\n name: {measurement_name}\n results:\n - name: {bar_name}\n value: {bar_value}\n graph_plots:\n - type: {measurement_type}\n name: {measurement_name}\n results:\n - class: {class_name}\n feature: {feature_name}\n data:\n - x_value: {x_value}\n y_value: {y_value}\n
"},{"location":"projects/tad/reporting-standard/#assessment-card","title":"Assessment Card","text":"provenance:\n git_commit_hash: {git_commit_hash}\n timestamp: {modification_timestamp}\n uri: {modification_uri}\n author: {modification_author}\nname: {assessment_name}\nurn: {urn}\ndate: {assessment_date}\ncontents:\n - question: {question_text}\n urn: {urn}\n answer: {answer_text}\n remarks: {remarks_text}\n authors:\n - name: {author_name}\n timestamp: {timestamp}\n
"},{"location":"projects/tad/reporting-standard/#schema","title":"Schema","text":"JSON schema will be added when we publish the first beta version.
"},{"location":"projects/tad/reporting-standard/#changelog","title":"Changelog","text":"Deviation from the Hugging Face specification is in the License field. Hugging Face only accepts dataset id's from Hugging Face license list while we accept any license from Open Source License List.\u00a0\u21a9\u21a9
Deviation from the Hugging Face specification is in the model_index:results:dataset
field. Hugging Face only accepts one dataset, while we accept a list of datasets.\u00a0\u21a9\u21a9
Deviation from the Hugging Face specification is in the Dataset Type field. Hugging Face only accepts dataset id's from Hugging Face datasets while we also allow for any url pointing to the dataset.\u00a0\u21a9\u21a9
For this extension to work relevant metrics (such as for example false positive rate) have to be added to the Hugging Face metrics, possibly this can be done in our organizational namespace.\u00a0\u21a9\u21a9
This document describes the Transparency of Algorithmic Decision making (TAD) Reporting Standard.
For reproducibility, governance, auditing and sharing of algorithmic systems it is essential to have a reporting standard so that information about an algorithmic system can be shared. This reporting standard describes how information about the different phases of an algorithm's life cycle can be reported. It contains, among other things, descriptive information combined with information about the technical tests and assessments applied.
Disclaimer
The TAD Reporting Standard is work in progress. This means that the current standard is probably suboptimal and will change significantly in future versions.
"},{"location":"projects/tad/reporting-standard/0.1a1/#introduction","title":"Introduction","text":"Inspired by Model Cards for Model Reporting and Papers with Code Model Index this standard almost 1 2 3 4 extends the Hugging Face model card metadata specification to allow for:
metrics_field
from the Hugging Face metadata specification.measurements
.assessments
.Following Hugging Face, this proposed standard will be written in yaml.
This standard does not contain all fields present in the Hugging Face metadata specification. The fields that are optional in the Hugging Face specification and are specific to the Hugging Face interface are omitted.
Another difference is that we divide our implementation into three separate parts.
system_card
, containing information about a group of ML-models which accomplish a specific task.model_card
, containing information about a specific data science model.assessment_card
, containing information about a regulatory assessment.Include statements
These model_card
s and assessment_card
s can be included verbatim into a system_card
, or referenced with an !include
statement, allowing for minimal cards to be compact in a single file. Extensive cards can be split up for readability and maintainability. Our standard allows for the !include
to be used anywhere.
The standard will be written in yaml. Example yaml files are given in the next section. The standard defines three cards: a system_card
, a model_card
and an assessment_card
. A system_card
contains information about an algorithmic system. It can have multiple models and each of these models should have a model_card
. Regulatory assessments can be processed in an assessment_card
. Note that model_card
's and assessment_card
's can be included directly into the system_card
or can be included as separate yaml files with help of a yaml-include mechanism. For clarity the latter is preferred and is also used in the examples in the next section.
system_card
","text":"A system_card
contains the following information.
schema_version
(REQUIRED, string). Version of the schema used, for example \"0.1a1\".name
(OPTIONAL, string). Name used to describe the system.upl
(OPTIONAL, string). If this algorithm is part of a product offered by the Dutch Government, it should contain a URI from the Uniform Product List.owners
(list). There can be multiple owners. For each owner the following fields are present.
oin
(OPTIONAL, string). If applicable the Organisatie-identificatienummer (OIN).organization
(OPTIONAL, string). Name of the organization that owns the model. If ion
is NOT provided this field is REQUIRED.name
(OPTIONAL, string). Name of a contact person within the organization.email
(OPTIONAL, string). Email address of the contact person or organization.role
(OPTIONAL, string). Role of the contact person. This field should only be set when the name
field is set.description
(OPTIONAL, string). A short description of the system.
labels
(OPTIONAL, list). This fields allows to store meta information about a system. There can be multiple labels. For each label the following fields are present.
name
(OPTIONAL, string). Name of the label.value
(OPTIONAL, string). Value of the label.status
(OPTIONAL, string). The status of the system. For example the status can be \"production\".
publication_category
(OPTIONAL, enum[string]). The publication category of the algorithm should be chosen from [\"high_risk\", other\"]
.begin_date
(OPTIONAL, string). The first date the system was used. Date should be given in ISO 8601 format, i.e. YYYY-MM-DD.end_date
(OPTIONAL, string). The last date the system was used. Date should be given in ISO 8601 format, i.e. YYYY-MM-DD.goal_and_impact
(OPTIONAL, string). The purpose of the system and the impact it has on citizens and companies.considerations
(OPTIONAL, string). The pro's and con's of using the system.risk_management
(OPTIONAL, string). Description of the risks associated with the system.human_intervention
(OPTIONAL, string). A description to want extend there is human involvement in the system.legal_base
(OPTIONAL, list). If there exists a legal base for the process the system is embedded in, this field can be filled in with the relevant laws. There can be multiple legal bases. For each legal base the following fields are present.name
(OPTIONAL, string). Name of the law.link
(OPTIONAL, string). URI pointing towards the contents of the law.used_data
(OPTIONAL, string). An overview of the data that is used in the system.technical_design
(OPTIONAL, string). Description on how the system works.external_providers
(OPTIONAL, list[string]). Name of an external provider, if relevant. There can be multiple external providers.references
(OPTIONAL, list[string]). Additional reference URI's that point information about the system and are relevant.models
(OPTIONAL, list[ModelCard]). A list of model cards (as defined below) or !include
s of a yaml file containing a model card. This model card can for example be a model card described in the next section or a model card from Hugging Face. There can be multiple model cards, meaning multiple models are used.assessments
(OPTIONAL, list[AssessmentCard]). A list of assessment cards (as defined below) or !include
s of a yaml file containing a assessment card. This assessment card is an assessment card described in the next section. There can be multiple assessment cards, meaning multiple assessment were performed.model_card
","text":"A model_card
contains the following information.
language
(OPTIONAL, list[string]). If relevant, the natural languages the model supports in ISO 639. There can be multiple languages.license
(REQUIRED, string). Any license from the open source license list 1. If the license is NOT present in the license list this field must be set to 'other' and the following two fields will be REQUIRED.
license_name
(string). An id for the license.license_link
(string). A link to a file of that name inside the repo, or a URL to a remote file containing the license contents.tags
(OPTIONAL, list[string]). Tags with keywords to describe the project. There can be multiple tags.
owners
(list). There can be multiple owners. For each owner the following fields are present.
oin
(OPTIONAL, string). If applicable the Organisatie-identificatienummer (OIN).organization
(OPTIONAL, string). Name of the organization that owns the model. If ion
is NOT provided this field is REQUIRED.name
(OPTIONAL, string). Name of a contact person within the organization.email
(OPTIONAL, string). Email address of the contact person or organization.role
(OPTIONAL, string). Role of the contact person. This field should only be set when the name
field is set.There can be multiple models. For each model the following fields are present.
name
(REQUIRED, string). The name of the model.model
(REQUIRED, string). A URI pointing to a repository containing the model file.artifacts
(OPTIONAL, list[string]). A list of URI's where each URI refers to a relevant model artifact, that cannot be captured by any other field, but are relevant to model.parameters
(list). There can be multiple parameters. For each parameter the following fields are present.
name
(REQUIRED, string). The name of the parameter, for example \"epochs\".dtype
(OPTIONAL, string). The datatype of the parameter, for example \"int\".value
(OPTIONAL, string). The value of the parameter, for example 100.labels
(list). This field allows to store meta information about a parameter. There can be multiple labels. For each label the following fields are present.
name
(OPTIONAL, string). The name of the label.dtype
(OPTIONAL, string). The datatype of the feature. If name
is set, this field is REQUIRED.value
(OPTIONAL, string). The value of the feature. If name
is set, this field is REQUIRED.results
(list). There can be multiple results. For each result the following fields are present.
task
(OPTIONAL, list).
task_type
(REQUIRED, string). The task of the model, for example \"object-classification\".task_name
(OPTIONAL, string). A pretty name for the model tasks, for example \"Object Classification\".datasets
(list). There can be multiple datasets 2. For each dataset the following fields are present.
type
(REQUIRED, string). The type of the dataset, can be a dataset id from Hugging Face datasets or any other link to a repository containing the dataset3, for example \"common_voice\".name
(REQUIRED, string). Name pretty name for the dataset, for example \"Common Voice (French)\".split
(OPTIONAL, string). The split of the dataset, for example \"train\".features
(OPTIONAL, list[string]). List of feature names.revision
(OPTIONAL, string). Version of the dataset, for example 5503434ddd753f426f4b38109466949a1217c2bb.metrics
(list). There can be multiple metrics. For each metric the following fields are present.
type
(REQUIRED, string). A metric-id from Hugging Face metrics4, for example accuracy.name
(REQUIRED, string). A descriptive name of the metric. For example \"false positive rate\" is not a descriptive name, but \"training false positive rate w.r.t class x\" is.dtype
(REQUIRED, string). The data type of the metric, for example float
.value
(REQUIRED, string). The value of the metric.labels
(list). This field allows to store meta information about a metric. For example, metrics can be computed for example on subgroups of specific features. For example, one can compute the accuracy for examples where the feature \"gender\" is set to \"male\". There can be multiple subgroups, which means that the metric is computed on the intersection of those subgroups. There can be multiple labels. For each label the following fields are present.
name
(OPTIONAL, string). The name of the feature. For example: \"gender\".type
(OPTIONAL, string). The type of the label. Can for example be set to \"feature\" or \"output_class\". If name
is set, this field is REQUIRED.dtype
(OPTIONAL, string). The datatype of the feature, for example float
. If name
is set, this field is REQUIRED.value
(OPTIONAL, string). The value of the feature. If name
is set, this field is REQUIRED. For example: \"male\".measurements
.
bar_plots
(list). The purpose of this field is to capture bar plot like measurements, for example SHAP values. There can be multiple bar plots. For each bar plot the following fields are present.
type
(REQUIRED, string). The type of bar plot, for example \"SHAP\".name
(OPTIONAL, string). A pretty name for the plot, for example \"Mean Absolute SHAP Values\".results
(list). The contents of the bar plot. A result represents a bar. There can be multiple results. For each result the following fields are present.name
(REQUIRED, string). The name of bar.value
(REQUIRED, float). The value of the corresponding bar.graph_plots
(list). The purpose of this field is to capture graph plot like measurements, such as partial dependence plots. There can be multiple graph plots. For each graph plot the following fields are present.
type
(REQUIRED, string). The type of the graph plot, for example \"partial_dependence\".name
(OPTIONAL, string). A pretty name of the graph, for example \"Partial Dependence Plot\".results
(list). Results contains the graph plot data. Each graph can depend on a specific output class and feature. There can be multiple results. For each result the following fields are present.class
(OPTIONAL, string/int/float/bool). The output class name that the graph corresponds to. This field is not always present.feature
(REQUIRED, string). The feature the graph corresponds to. This is required, since all relevant graphs are dependent on features.data
(list)x_value
(REQUIRED, float). The $x$-value of the graph.y_value
(REQUIRED, float). The $y$-value of the graph.assessment_card
","text":"An assessment_card
contains the following information.
name
(REQUIRED, string). The name of the assessment.date
(REQUIRED, string). The date at which the assessment is completed. Date should be given in ISO 8601 format, i.e. YYYY-MM-DD.contents
(list). There can be multiple items in contents. For each item the following fields are present:
question
(REQUIRED, string). A question.answer
(REQUIRED, string). An answer.remarks
(OPTIONAL, string). A field to put relevant discussion remarks in.authors
. There can be multiple names. For each name the following field is present.name
(OPTIONAL, string). The name of the author of the question.timestamp
(OPTIONAL, string). A timestamp of the date and time of the answer.version: {system_card_version} # Optional. Example: \"0.1a1\"\nname: {system_name} # Optional. Example: \"AangifteVertrekBuitenland\"\nupl: {upl_uri} # Optional. Example: https://standaarden.overheid.nl/owms/terms/AangifteVertrekBuitenland\nowners:\n- oin: {oin} # Optional. Example: 00000001003214345000\n organization: {organization_name} # Optional if oin is provided, Required otherwise. Example: BZK\n name: {owner_name} # Optional. Example: John Doe\n email: {owner_email} # Optional. Example: johndoe@email.com\n role: {owner_role} # Optional. Example: Data Scientist.\ndescription: {system_description} # Optional. Short description of the system.\nlabels: # Optional. Labels to store metadata about the system.\n- name: {label_name} # Optional.\n value: {label_value} # Optional.\nstatus: {system_status} # Optional. Example \"production\".\npublication_category: {system_publication_cat} # Optional. Example: \"impactful_algorithm\".\nbegin_date: {system_begin_date} # Optional. Example: 2025-1-1.\nend_date: {system_end_date} # Optional. Example: 2025-12-1.\ngoal_and_impact: {system_goal_and_impact} # Optional. Goal and impact of the system.\nconsiderations: {system_considerations} # Optional. Considerations about the system.\nrisk_management: {system_risk_management} # Optional. Description of risks associated with the system.\nhuman_intervention: {system_human_intervention} # Optional. Description of human involvement in the system.\nlegal_base:\n- name: {law_name} # Optional. Example: \"AVG\".\n link: {law_uri} # Optional. Example: \"https://eur-lex.europa.eu/legal-content/NL/TXT/HTML/?uri=CELEX:31995L0046\".\nused_data: {system_used_data} # Optional. Description of the data used by the system.\ntechnical_design: {technical_design} # Optional. Description of the technical design of the system.\nexternal_providers:\n- {system_external_provider} # Optional. Reference to used external providers.\nreferences:\n- {reference_uri} # Optional. Example: URI to codebase.\n\nmodels:\n- !include {model_card_uri} # Optional. Example: cat_classifier_model.yaml.\n\nassessments:\n- !include {assessment_card_uri} # Required. Example: iama.yaml.\n
"},{"location":"projects/tad/reporting-standard/0.1a1/#model-card","title":"Model Card","text":"language:\n - {lang_0} # Optional. Example nl.\nlicense: {license} # Required. Example: Apache-2.0 or any license SPDX ID from https://opensource.org/license or \"other\".\nlicense_name: {license_name} # Optional if license != other, Required otherwise. Example: 'my-license-1.0'\nlicense_link: {license_link} # Optional if license != other, Required otherwise. Specify \"LICENSE\" or \"LICENSE.md\" to link to a file of that name inside the repo, or a URL to a remote file.\ntags:\n- {tag_0} # Optional. Example: audio\n- {tag_1} # Optional. Example: automatic-speech-recognition\nowners:\n- organization: {organization_name} # Required. Example: BZK\n oin: {oin} # Optional. Example: 00000001003214345000\n name: {owner_name} # Optional. Example: John Doe\n email: {owner_email} # Optional. Example: johndoe@email.com\n role: {owner_role} # Optional. Example: Data Scientist.\n\nmodel-index:\n- name: {model_id} # Required. Example: CatClassifier.\n model: {model_uri} # Required. URI to a repository containing the model file.\n artifacts:\n - {model_artifact} # Optional. URI to relevant model artifacts, if applicable.\n parameters:\n - name: {parameter_name} # Optional. Example: \"epochs\".\n dtype: {parameter_dtype} # Optional. Example: \"int\".\n value: {parameter_value} # Optional. Example: 100.\n labels:\n - name: {label_name} # Optional. Example: \"gender\".\n dtype: {label_type} # Optional. Example: \"string\".\n value: {label_value} # Optional. Example: \"female\".\n results:\n - task:\n type: {task_type} # Required. Example: image-classification.\n name: {task_name} # Optional. Example: Image Classification.\n datasets:\n - type: {dataset_type} # Required. Example: common_voice. Link to a repository containing the dataset\n name: {dataset_name} # Required. Example: \"Common Voice (French)\". A pretty name for the dataset.\n split: {split} # Optional. Example: \"train\".\n features:\n - {feature_name} # Optional. Example: \"gender\".\n revision: {dataset_version} # Optional. Example: 5503434ddd753f426f4b38109466949a1217c2bb\n metrics:\n - type: {metric_type} # Required. Example: false-positive-rate. Use metric id from https://hf.co/metrics.\n name: {metric_name} # Required. Example: \"FPR wrt class 0 restricted to feature gender:0 and age:21\".\n dtype: {metric_dtype} # Required. Example: \"float\".\n value: {metric_value} # Required. Example: 0.75.\n labels:\n - name: {label_name} # Optional. Example: \"gender\".\n type: {label_type} # Optional. Example: \"feature\".\n dtype: {label_type} # Optional. Example: \"string\".\n value: {label_value} # Optional. Example: \"female\".\n measurements:\n # Bar plots should be able to capture SHAP and Robustness Toolbox from AI Verify.\n bar_plots:\n - type: {measurement_type} # Required. Example: \"SHAP\".\n name: {measurement_name} # Optional. Example: \"Mean Absolute Shap Values\".\n results:\n - name: {bar_name} # Required. The name of a bar.\n value: {bar_value} # Required. The corresponding value.\n # Graph plots should be able to capture graph based measurements such as partial dependence and accumulated local effect.\n graph_plots:\n - type: {measurement_type} # Required. Example: \"partial_dependence\".\n name: {measurement_name} # Optional. Example: \"Partial Dependence Plot\".\n # Results store the graph plot data. So far all plots are dependent on a combination of a specific class (sometimes) and feature (always).\n # For example partial dependence plots are made for each feature and class.\n results:\n - class: {class_name} # Optional. Name of the output class the graph depends on.\n feature: {feature_name} # Required. Name of the feature the graph depends on.\n data:\n - x_value: {x_value} # Required. The x value of the graph data.\n y_value: {y_value} # Required. The y value of the graph data.\n
"},{"location":"projects/tad/reporting-standard/0.1a1/#assessment-card","title":"Assessment Card","text":"name: {assessment_name} # Required. Example: IAMA.\ndate: {assessment_date} # Required. Example: 25-03-2025.\ncontents:\n - question: {question_text} # Required. Example: \"Question 1: ...\".\n answer: {answer_text} # Required. Example: \"Answer: ...\".\n remarks: {remarks_text} # Optional. Example: \"Remarks: ...\".\n authors: # Optional. Example: \"['John', 'Peter']\".\n - name: {author_name}\n timestamp: {timestamp} # Optional. Example: 1711630721.\n
"},{"location":"projects/tad/reporting-standard/0.1a1/#schema","title":"Schema","text":"JSON schema will be added when we publish the first beta version.
Deviation from the Hugging Face specification is in the License field. Hugging Face only accepts dataset id's from Hugging Face license list while we accept any license from Open Source License List.\u00a0\u21a9\u21a9
Deviation from the Hugging Face specification is in the model_index:results:dataset
field. Hugging Face only accepts one dataset, while we accept a list of datasets.\u00a0\u21a9\u21a9
Deviation from the Hugging Face specification is in the Dataset Type field. Hugging Face only accepts dataset id's from Hugging Face datasets while we also allow for any url pointing to the dataset.\u00a0\u21a9\u21a9
For this extension to work relevant metrics (such as for example false positive rate) have to be added to the Hugging Face metrics, possibly this can be done in our organizational namespace.\u00a0\u21a9\u21a9
This document describes the Transparency of Algorithmic Decision making (TAD) Reporting Standard.
For reproducibility, governance, auditing and sharing of algorithmic systems it is essential to have a reporting standard so that information about an algorithmic system can be shared. This reporting standard describes how information about the different phases of an algorithm's life cycle can be reported. It contains, among other things, descriptive information combined with information about the technical tests and assessments applied.
Disclaimer
The TAD Reporting Standard is work in progress. This means that the current standard is probably suboptimal and will change significantly in future versions.
"},{"location":"projects/tad/reporting-standard/0.1a2/#introduction","title":"Introduction","text":"Inspired by Model Cards for Model Reporting and Papers with Code Model Index this standard almost 1 2 3 4 extends the Hugging Face model card metadata specification to allow for:
metrics_field
from the Hugging Face metadata specification.measurements
.assessments
.Following Hugging Face, this proposed standard will be written in yaml.
This standard does not contain all fields present in the Hugging Face metadata specification. The fields that are optional in the Hugging Face specification and are specific to the Hugging Face interface are omitted.
Another difference is that we divide our implementation into three separate parts.
system_card
, containing information about a group of ML-models which accomplish a specific task.model_card
, containing information about a specific data science model.assessment_card
, containing information about a regulatory assessment.Include statements
These model_card
s and assessment_card
s can be included verbatim into a system_card
, or referenced with an !include
statement, allowing for minimal cards to be compact in a single file. Extensive cards can be split up for readability and maintainability. Our standard allows for the !include
to be used anywhere.
The standard will be written in yaml. Example yaml files are given in the next section. The standard defines three cards: a system_card
, a model_card
and an assessment_card
. A system_card
contains information about an algorithmic system. It can have multiple models and each of these models should have a model_card
. Regulatory assessments can be processed in an assessment_card
. Note that model_card
's and assessment_card
's can be included directly into the system_card
or can be included as separate yaml files with help of a yaml-include mechanism. For clarity the latter is preferred and is also used in the examples in the next section.
system_card
","text":"A system_card
contains the following information.
schema_version
(REQUIRED, string). Version of the schema used, for example \"0.1a2\".name
(OPTIONAL, string). Name used to describe the system.upl
(OPTIONAL, string). If this algorithm is part of a product offered by the Dutch Government, it should contain a URI from the Uniform Product List.owners
(list). There can be multiple owners. For each owner the following fields are present.
oin
(OPTIONAL, string). If applicable the Organisatie-identificatienummer (OIN).organization
(OPTIONAL, string). Name of the organization that owns the model. If ion
is NOT provided this field is REQUIRED.name
(OPTIONAL, string). Name of a contact person within the organization.email
(OPTIONAL, string). Email address of the contact person or organization.role
(OPTIONAL, string). Role of the contact person. This field should only be set when the name
field is set.description
(OPTIONAL, string). A short description of the system.
labels
(OPTIONAL, list). This fields allows to store meta information about a system. There can be multiple labels. For each label the following fields are present.
name
(OPTIONAL, string). Name of the label.value
(OPTIONAL, string). Value of the label.status
(OPTIONAL, string). The status of the system. For example the status can be \"production\".
publication_category
(OPTIONAL, enum[string]). The publication category of the algorithm should be chosen from [\"high_risk\", other\"]
.begin_date
(OPTIONAL, string). The first date the system was used. Date should be given in ISO 8601 format, i.e. YYYY-MM-DD.end_date
(OPTIONAL, string). The last date the system was used. Date should be given in ISO 8601 format, i.e. YYYY-MM-DD.goal_and_impact
(OPTIONAL, string). The purpose of the system and the impact it has on citizens and companies.considerations
(OPTIONAL, string). The pro's and con's of using the system.risk_management
(OPTIONAL, string). Description of the risks associated with the system.human_intervention
(OPTIONAL, string). A description to want extend there is human involvement in the system.legal_base
(OPTIONAL, list). If there exists a legal base for the process the system is embedded in, this field can be filled in with the relevant laws. There can be multiple legal bases. For each legal base the following fields are present.name
(OPTIONAL, string). Name of the law.link
(OPTIONAL, string). URI pointing towards the contents of the law.used_data
(OPTIONAL, string). An overview of the data that is used in the system.technical_design
(OPTIONAL, string). Description on how the system works.external_providers
(OPTIONAL, list[string]). Name of an external provider, if relevant. There can be multiple external providers.references
(OPTIONAL, list[string]). Additional reference URI's that point information about the system and are relevant.models
(OPTIONAL, list[ModelCard]). A list of model cards (as defined below) or !include
s of a yaml file containing a model card. This model card can for example be a model card described in the next section or a model card from Hugging Face. There can be multiple model cards, meaning multiple models are used.assessments
(OPTIONAL, list[AssessmentCard]). A list of assessment cards (as defined below) or !include
s of a yaml file containing a assessment card. This assessment card is an assessment card described in the next section. There can be multiple assessment cards, meaning multiple assessment were performed.model_card
","text":"A model_card
contains the following information.
language
(OPTIONAL, list[string]). If relevant, the natural languages the model supports in ISO 639. There can be multiple languages.license
(REQUIRED, string). Any license from the open source license list 1. If the license is NOT present in the license list this field must be set to 'other' and the following two fields will be REQUIRED.
license_name
(string). An id for the license.license_link
(string). A link to a file of that name inside the repo, or a URL to a remote file containing the license contents.tags
(OPTIONAL, list[string]). Tags with keywords to describe the project. There can be multiple tags.
owners
(list). There can be multiple owners. For each owner the following fields are present.
oin
(OPTIONAL, string). If applicable the Organisatie-identificatienummer (OIN).organization
(OPTIONAL, string). Name of the organization that owns the model. If ion
is NOT provided this field is REQUIRED.name
(OPTIONAL, string). Name of a contact person within the organization.email
(OPTIONAL, string). Email address of the contact person or organization.role
(OPTIONAL, string). Role of the contact person. This field should only be set when the name
field is set.There can be multiple models. For each model the following fields are present.
name
(REQUIRED, string). The name of the model.model
(REQUIRED, string). A URI pointing to a repository containing the model file.artifacts
(OPTIONAL, list). A list of artifacts
uri
(OPTIONAL, string) URI refers to a relevant model artifactcontent-type
(OPTIONAL, string) Optional type, follow the Content-Type. Recognized values are \"application/onnx\"\", to refer to an ONNX representation of the model.md5-checksum
(OPTIONAL, string) Optional checksum for the content of the file.parameters
(list). There can be multiple parameters. For each parameter the following fields are present.
name
(REQUIRED, string). The name of the parameter, for example \"epochs\".dtype
(OPTIONAL, string). The datatype of the parameter, for example \"int\".value
(OPTIONAL, string). The value of the parameter, for example 100.labels
(list). This field allows to store meta information about a parameter. There can be multiple labels. For each label the following fields are present.
name
(OPTIONAL, string). The name of the label.dtype
(OPTIONAL, string). The datatype of the feature. If name
is set, this field is REQUIRED.value
(OPTIONAL, string). The value of the feature. If name
is set, this field is REQUIRED.results
(list). There can be multiple results. For each result the following fields are present.
task
(OPTIONAL, list).
task_type
(REQUIRED, string). The task of the model, for example \"object-classification\".task_name
(OPTIONAL, string). A pretty name for the model tasks, for example \"Object Classification\".datasets
(list). There can be multiple datasets 2. For each dataset the following fields are present.
type
(REQUIRED, string). The type of the dataset, can be a dataset id from Hugging Face datasets or any other link to a repository containing the dataset3, for example \"common_voice\".name
(REQUIRED, string). Name pretty name for the dataset, for example \"Common Voice (French)\".split
(OPTIONAL, string). The split of the dataset, for example \"train\".features
(OPTIONAL, list[string]). List of feature names.revision
(OPTIONAL, string). Version of the dataset, for example 5503434ddd753f426f4b38109466949a1217c2bb.metrics
(list). There can be multiple metrics. For each metric the following fields are present.
type
(REQUIRED, string). A metric-id from Hugging Face metrics4, for example accuracy.name
(REQUIRED, string). A descriptive name of the metric. For example \"false positive rate\" is not a descriptive name, but \"training false positive rate w.r.t class x\" is.dtype
(REQUIRED, string). The data type of the metric, for example float
.value
(REQUIRED, string). The value of the metric.labels
(list). This field allows to store meta information about a metric. For example, metrics can be computed for example on subgroups of specific features. For example, one can compute the accuracy for examples where the feature \"gender\" is set to \"male\". There can be multiple subgroups, which means that the metric is computed on the intersection of those subgroups. There can be multiple labels. For each label the following fields are present.
name
(OPTIONAL, string). The name of the feature. For example: \"gender\".type
(OPTIONAL, string). The type of the label. Can for example be set to \"feature\" or \"output_class\". If name
is set, this field is REQUIRED.dtype
(OPTIONAL, string). The datatype of the feature, for example float
. If name
is set, this field is REQUIRED.value
(OPTIONAL, string). The value of the feature. If name
is set, this field is REQUIRED. For example: \"male\".measurements
.
bar_plots
(list). The purpose of this field is to capture bar plot like measurements, for example SHAP values. There can be multiple bar plots. For each bar plot the following fields are present.
type
(REQUIRED, string). The type of bar plot, for example \"SHAP\".name
(OPTIONAL, string). A pretty name for the plot, for example \"Mean Absolute SHAP Values\".results
(list). The contents of the bar plot. A result represents a bar. There can be multiple results. For each result the following fields are present.name
(REQUIRED, string). The name of bar.value
(REQUIRED, float). The value of the corresponding bar.graph_plots
(list). The purpose of this field is to capture graph plot like measurements, such as partial dependence plots. There can be multiple graph plots. For each graph plot the following fields are present.
type
(REQUIRED, string). The type of the graph plot, for example \"partial_dependence\".name
(OPTIONAL, string). A pretty name of the graph, for example \"Partial Dependence Plot\".results
(list). Results contains the graph plot data. Each graph can depend on a specific output class and feature. There can be multiple results. For each result the following fields are present.class
(OPTIONAL, string/int/float/bool). The output class name that the graph corresponds to. This field is not always present.feature
(REQUIRED, string). The feature the graph corresponds to. This is required, since all relevant graphs are dependent on features.data
(list)x_value
(REQUIRED, float). The $x$-value of the graph.y_value
(REQUIRED, float). The $y$-value of the graph.assessment_card
","text":"An assessment_card
contains the following information.
name
(REQUIRED, string). The name of the assessment.date
(REQUIRED, string). The date at which the assessment is completed. Date should be given in ISO 8601 format, i.e. YYYY-MM-DD.contents
(list). There can be multiple items in contents. For each item the following fields are present:
question
(REQUIRED, string). A question.answer
(REQUIRED, string). An answer.remarks
(OPTIONAL, string). A field to put relevant discussion remarks in.authors
. There can be multiple names. For each name the following field is present.name
(OPTIONAL, string). The name of the author of the question.timestamp
(OPTIONAL, string). A timestamp of the date and time of the answer.version: {system_card_version} # Optional. Example: \"0.1a1\"\nname: {system_name} # Optional. Example: \"AangifteVertrekBuitenland\"\nupl: {upl_uri} # Optional. Example: https://standaarden.overheid.nl/owms/terms/AangifteVertrekBuitenland\nowners:\n- oin: {oin} # Optional. Example: 00000001003214345000\n organization: {organization_name} # Optional if oin is provided, Required otherwise. Example: BZK\n name: {owner_name} # Optional. Example: John Doe\n email: {owner_email} # Optional. Example: johndoe@email.com\n role: {owner_role} # Optional. Example: Data Scientist.\ndescription: {system_description} # Optional. Short description of the system.\nlabels: # Optional. Labels to store metadata about the system.\n- name: {label_name} # Optional.\n value: {label_value} # Optional.\nstatus: {system_status} # Optional. Example: \"production\".\npublication_category: {system_publication_cat} # Optional. Example: \"impactful_algorithm\".\nbegin_date: {system_begin_date} # Optional. Example: 2025-1-1.\nend_date: {system_end_date} # Optional. Example: 2025-12-1.\ngoal_and_impact: {system_goal_and_impact} # Optional. Goal and impact of the system.\nconsiderations: {system_considerations} # Optional. Considerations about the system.\nrisk_management: {system_risk_management} # Optional. Description of risks associated with the system.\nhuman_intervention: {system_human_intervention} # Optional. Description of human involvement in the system.\nlegal_base:\n- name: {law_name} # Optional. Example: \"AVG\".\n link: {law_uri} # Optional. Example: \"https://eur-lex.europa.eu/legal-content/NL/TXT/HTML/?uri=CELEX:31995L0046\".\nused_data: {system_used_data} # Optional. Description of the data used by the system.\ntechnical_design: {technical_design} # Optional. Description of the technical design of the system.\nexternal_providers:\n- {system_external_provider} # Optional. Reference to used external providers.\nreferences:\n- {reference_uri} # Optional. Example: URI to codebase.\n\nmodels:\n- !include {model_card_uri} # Optional. Example: cat_classifier_model.yaml.\n\nassessments:\n- !include {assessment_card_uri} # Required. Example: iama.yaml.\n
"},{"location":"projects/tad/reporting-standard/0.1a2/#model-card","title":"Model Card","text":"language:\n - {lang_0} # Optional. Example nl.\nlicense: {license} # Required. Example: Apache-2.0 or any license SPDX ID from https://opensource.org/license or \"other\".\nlicense_name: {license_name} # Optional if license != other, Required otherwise. Example: 'my-license-1.0'\nlicense_link: {license_link} # Optional if license != other, Required otherwise. Specify \"LICENSE\" or \"LICENSE.md\" to link to a file of that name inside the repo, or a URL to a remote file.\ntags:\n- {tag_0} # Optional. Example: audio\n- {tag_1} # Optional. Example: automatic-speech-recognition\nowners:\n- organization: {organization_name} # Required. Example: BZK\n oin: {oin} # Optional. Example: 00000001003214345000\n name: {owner_name} # Optional. Example: John Doe\n email: {owner_email} # Optional. Example: johndoe@email.com\n role: {owner_role} # Optional. Example: Data Scientist.\n\nmodel-index:\n- name: {model_id} # Required. Example: CatClassifier.\n model: {model_uri} # Required. URI to a repository containing the model file.\n artifacts:\n - uri: {model_artifact_uri} # Optional. Example: \"https://github.com/MinBZK/poc-kijkdoos-wasm-models/raw/main/logres_iris/logreg_iris.onnx\"\n - content-type: {model_artifact_type} # Optional. Example: \"application/onnx\".\n - md5-checksum: {md5_checksum} # Optional. Example: \"120EA8A25E5D487BF68B5F7096440019\"\n parameters:\n - name: {parameter_name} # Optional. Example: \"epochs\".\n dtype: {parameter_dtype} # Optional. Example: \"int\".\n value: {parameter_value} # Optional. Example: 100.\n labels:\n - name: {label_name} # Optional. Example: \"gender\".\n dtype: {label_type} # Optional. Example: \"string\".\n value: {label_value} # Optional. Example: \"female\".\n results:\n - task:\n type: {task_type} # Required. Example: image-classification.\n name: {task_name} # Optional. Example: Image Classification.\n datasets:\n - type: {dataset_type} # Required. Example: common_voice. Link to a repository containing the dataset\n name: {dataset_name} # Required. Example: \"Common Voice (French)\". A pretty name for the dataset.\n split: {split} # Optional. Example: \"train\".\n features:\n - {feature_name} # Optional. Example: \"gender\".\n revision: {dataset_version} # Optional. Example: 5503434ddd753f426f4b38109466949a1217c2bb\n metrics:\n - type: {metric_type} # Required. Example: false-positive-rate. Use metric id from https://hf.co/metrics.\n name: {metric_name} # Required. Example: \"FPR wrt class 0 restricted to feature gender:0 and age:21\".\n dtype: {metric_dtype} # Required. Example: \"float\".\n value: {metric_value} # Required. Example: 0.75.\n labels:\n - name: {label_name} # Optional. Example: \"gender\".\n type: {label_type} # Optional. Example: \"feature\".\n dtype: {label_type} # Optional. Example: \"string\".\n value: {label_value} # Optional. Example: \"female\".\n measurements:\n # Bar plots should be able to capture SHAP and Robustness Toolbox from AI Verify.\n bar_plots:\n - type: {measurement_type} # Required. Example: \"SHAP\".\n name: {measurement_name} # Optional. Example: \"Mean Absolute Shap Values\".\n results:\n - name: {bar_name} # Required. The name of a bar.\n value: {bar_value} # Required. The corresponding value.\n # Graph plots should be able to capture graph based measurements such as partial dependence and accumulated local effect.\n graph_plots:\n - type: {measurement_type} # Required. Example: \"partial_dependence\".\n name: {measurement_name} # Optional. Example: \"Partial Dependence Plot\".\n # Results store the graph plot data. So far all plots are dependent on a combination of a specific class (sometimes) and feature (always).\n # For example partial dependence plots are made for each feature and class.\n results:\n - class: {class_name} # Optional. Name of the output class the graph depends on.\n feature: {feature_name} # Required. Name of the feature the graph depends on.\n data:\n - x_value: {x_value} # Required. The x value of the graph data.\n y_value: {y_value} # Required. The y value of the graph data.\n
"},{"location":"projects/tad/reporting-standard/0.1a2/#assessment-card","title":"Assessment Card","text":"name: {assessment_name} # Required. Example: IAMA.\ndate: {assessment_date} # Required. Example: 25-03-2025.\ncontents:\n - question: {question_text} # Required. Example: \"Question 1: ...\".\n answer: {answer_text} # Required. Example: \"Answer: ...\".\n remarks: {remarks_text} # Optional. Example: \"Remarks: ...\".\n authors: # Optional. Example: \"['John', 'Peter']\".\n - name: {author_name}\n timestamp: {timestamp} # Optional. Example: 1711630721.\n
"},{"location":"projects/tad/reporting-standard/0.1a2/#schema","title":"Schema","text":"JSON schema will be added when we publish the first beta version.
"},{"location":"projects/tad/reporting-standard/0.1a2/#changelog","title":"Changelog","text":"Deviation from the Hugging Face specification is in the License field. Hugging Face only accepts dataset id's from Hugging Face license list while we accept any license from Open Source License List.\u00a0\u21a9\u21a9
Deviation from the Hugging Face specification is in the model_index:results:dataset
field. Hugging Face only accepts one dataset, while we accept a list of datasets.\u00a0\u21a9\u21a9
Deviation from the Hugging Face specification is in the Dataset Type field. Hugging Face only accepts dataset id's from Hugging Face datasets while we also allow for any url pointing to the dataset.\u00a0\u21a9\u21a9
For this extension to work relevant metrics (such as for example false positive rate) have to be added to the Hugging Face metrics, possibly this can be done in our organizational namespace.\u00a0\u21a9\u21a9
This document describes the Transparency of Algorithmic Decision making (TAD) Reporting Standard.
For reproducibility, governance, auditing and sharing of algorithmic systems it is essential to have a reporting standard so that information about an algorithmic system can be shared. This reporting standard describes how information about the different phases of an algorithm's life cycle can be reported. It contains, among other things, descriptive information combined with information about the technical tests and assessments applied.
Disclaimer
The TAD Reporting Standard is work in progress. This means that the current standard is probably suboptimal and will change significantly in future versions.
"},{"location":"projects/tad/reporting-standard/0.1a3/#introduction","title":"Introduction","text":"Inspired by Model Cards for Model Reporting and Papers with Code Model Index this standard almost 1 2 3 4 extends the Hugging Face model card metadata specification to allow for:
metrics_field
from the Hugging Face metadata specification.measurements
.assessments
.Following Hugging Face, this proposed standard will be written in yaml.
This standard does not contain all fields present in the Hugging Face metadata specification. The fields that are optional in the Hugging Face specification and are specific to the Hugging Face interface are omitted.
Another difference is that we divide our implementation into three separate parts.
system_card
, containing information about a group of ML-models which accomplish a specific task.model_card
, containing information about a specific data science model.assessment_card
, containing information about a regulatory assessment.Include statements
These model_card
s and assessment_card
s can be included verbatim into a system_card
, or referenced with an !include
statement, allowing for minimal cards to be compact in a single file. Extensive cards can be split up for readability and maintainability. Our standard allows for the !include
to be used anywhere.
The standard will be written in yaml. Example yaml files are given in the next section. The standard defines three cards: a system_card
, a model_card
and an assessment_card
. A system_card
contains information about an algorithmic system. It can have multiple models and each of these models should have a model_card
. Regulatory assessments can be processed in an assessment_card
. Note that model_card
's and assessment_card
's can be included directly into the system_card
or can be included as separate yaml files with help of a yaml-include mechanism. For clarity the latter is preferred and is also used in the examples in the next section.
system_card
","text":"A system_card
contains the following information.
schema_version
(REQUIRED, string). Version of the schema used, for example \"0.1a2\".name
(OPTIONAL, string). Name used to describe the system.upl
(OPTIONAL, string). If this algorithm is part of a product offered by the Dutch Government, it should contain a URI from the Uniform Product List.owners
(list). There can be multiple owners. For each owner the following fields are present.
oin
(OPTIONAL, string). If applicable the Organisatie-identificatienummer (OIN).organization
(OPTIONAL, string). Name of the organization that owns the model. If ion
is NOT provided this field is REQUIRED.name
(OPTIONAL, string). Name of a contact person within the organization.email
(OPTIONAL, string). Email address of the contact person or organization.role
(OPTIONAL, string). Role of the contact person. This field should only be set when the name
field is set.description
(OPTIONAL, string). A short description of the system.
labels
(OPTIONAL, list). This fields allows to store meta information about a system. There can be multiple labels. For each label the following fields are present.
name
(OPTIONAL, string). Name of the label.value
(OPTIONAL, string). Value of the label.status
(OPTIONAL, string). The status of the system. For example the status can be \"production\".
publication_category
(OPTIONAL, enum[string]). The publication category of the algorithm should be chosen from [\"high_risk\", other\"]
.begin_date
(OPTIONAL, string). The first date the system was used. Date should be given in ISO 8601 format, i.e. YYYY-MM-DD
.end_date
(OPTIONAL, string). The last date the system was used. Date should be given in ISO 8601 format, i.e. YYYY-MM-DD
.goal_and_impact
(OPTIONAL, string). The purpose of the system and the impact it has on citizens and companies.considerations
(OPTIONAL, string). The pro's and con's of using the system.risk_management
(OPTIONAL, string). Description of the risks associated with the system.human_intervention
(OPTIONAL, string). A description to want extend there is human involvement in the system.legal_base
(OPTIONAL, list). If there exists a legal base for the process the system is embedded in, this field can be filled in with the relevant laws. There can be multiple legal bases. For each legal base the following fields are present.name
(OPTIONAL, string). Name of the law.link
(OPTIONAL, string). URI pointing towards the contents of the law.used_data
(OPTIONAL, string). An overview of the data that is used in the system.technical_design
(OPTIONAL, string). Description on how the system works.external_providers
(OPTIONAL, list[string]). Name of an external provider, if relevant. There can be multiple external providers.references
(OPTIONAL, list[string]). Additional reference URI's that point information about the system and are relevant.models
(OPTIONAL, list[ModelCard]). A list of model cards (as defined below) or !include
s of a yaml file containing a model card. This model card can for example be a model card described in the next section or a model card from Hugging Face. There can be multiple model cards, meaning multiple models are used.assessments
(OPTIONAL, list[AssessmentCard]). A list of assessment cards (as defined below) or !include
s of a yaml file containing a assessment card. This assessment card is an assessment card described in the next section. There can be multiple assessment cards, meaning multiple assessment were performed.model_card
","text":"A model_card
contains the following information.
language
(OPTIONAL, list[string]). If relevant, the natural languages the model supports in ISO 639. There can be multiple languages.license
(REQUIRED, string). Any license from the open source license list 1. If the license is NOT present in the license list this field must be set to 'other' and the following two fields will be REQUIRED.
license_name
(string). An id for the license.license_link
(string). A link to a file of that name inside the repo, or a URL to a remote file containing the license contents.tags
(OPTIONAL, list[string]). Tags with keywords to describe the project. There can be multiple tags.
owners
(list). There can be multiple owners. For each owner the following fields are present.
oin
(OPTIONAL, string). If applicable the Organisatie-identificatienummer (OIN).organization
(OPTIONAL, string). Name of the organization that owns the model. If ion
is NOT provided this field is REQUIRED.name
(OPTIONAL, string). Name of a contact person within the organization.email
(OPTIONAL, string). Email address of the contact person or organization.role
(OPTIONAL, string). Role of the contact person. This field should only be set when the name
field is set.There can be multiple models. For each model the following fields are present.
name
(REQUIRED, string). The name of the model.model
(REQUIRED, string). A URI pointing to a repository containing the model file.artifacts
(OPTIONAL, list). A list of artifacts
uri
(OPTIONAL, string) URI refers to a relevant model artifactcontent-type
(OPTIONAL, string) Optional type, follow the Content-Type. Recognized values are \"application/onnx\"\", to refer to an ONNX representation of the model.md5-checksum
(OPTIONAL, string) Optional checksum for the content of the file.parameters
(list). There can be multiple parameters. For each parameter the following fields are present.
name
(REQUIRED, string). The name of the parameter, for example \"epochs\".dtype
(OPTIONAL, string). The datatype of the parameter, for example \"int\".value
(OPTIONAL, string). The value of the parameter, for example 100.labels
(list). This field allows to store meta information about a parameter. There can be multiple labels. For each label the following fields are present.
name
(OPTIONAL, string). The name of the label.dtype
(OPTIONAL, string). The datatype of the feature. If name
is set, this field is REQUIRED.value
(OPTIONAL, string). The value of the feature. If name
is set, this field is REQUIRED.results
(list). There can be multiple results. For each result the following fields are present.
task
(OPTIONAL, list).
task_type
(REQUIRED, string). The task of the model, for example \"object-classification\".task_name
(OPTIONAL, string). A pretty name for the model tasks, for example \"Object Classification\".datasets
(list). There can be multiple datasets 2. For each dataset the following fields are present.
type
(REQUIRED, string). The type of the dataset, can be a dataset id from Hugging Face datasets or any other link to a repository containing the dataset3, for example \"common_voice\".name
(REQUIRED, string). Name pretty name for the dataset, for example \"Common Voice (French)\".split
(OPTIONAL, string). The split of the dataset, for example \"train\".features
(OPTIONAL, list[string]). List of feature names.revision
(OPTIONAL, string). Version of the dataset, for example 5503434ddd753f426f4b38109466949a1217c2bb.metrics
(list). There can be multiple metrics. For each metric the following fields are present.
type
(REQUIRED, string). A metric-id from Hugging Face metrics4, for example accuracy.name
(REQUIRED, string). A descriptive name of the metric. For example \"false positive rate\" is not a descriptive name, but \"training false positive rate w.r.t class x\" is.dtype
(REQUIRED, string). The data type of the metric, for example float
.value
(REQUIRED, string). The value of the metric.labels
(list). This field allows to store meta information about a metric. For example, metrics can be computed for example on subgroups of specific features. For example, one can compute the accuracy for examples where the feature \"gender\" is set to \"male\". There can be multiple subgroups, which means that the metric is computed on the intersection of those subgroups. There can be multiple labels. For each label the following fields are present.
name
(OPTIONAL, string). The name of the feature. For example: \"gender\".type
(OPTIONAL, string). The type of the label. Can for example be set to \"feature\" or \"output_class\". If name
is set, this field is REQUIRED.dtype
(OPTIONAL, string). The datatype of the feature, for example float
. If name
is set, this field is REQUIRED.value
(OPTIONAL, string). The value of the feature. If name
is set, this field is REQUIRED. For example: \"male\".measurements
.
bar_plots
(list). The purpose of this field is to capture bar plot like measurements, for example SHAP values. There can be multiple bar plots. For each bar plot the following fields are present.
type
(REQUIRED, string). The type of bar plot, for example \"SHAP\".name
(OPTIONAL, string). A pretty name for the plot, for example \"Mean Absolute SHAP Values\".results
(list). The contents of the bar plot. A result represents a bar. There can be multiple results. For each result the following fields are present.name
(REQUIRED, string). The name of bar.value
(REQUIRED, float). The value of the corresponding bar.graph_plots
(list). The purpose of this field is to capture graph plot like measurements, such as partial dependence plots. There can be multiple graph plots. For each graph plot the following fields are present.
type
(REQUIRED, string). The type of the graph plot, for example \"partial_dependence\".name
(OPTIONAL, string). A pretty name of the graph, for example \"Partial Dependence Plot\".results
(list). Results contains the graph plot data. Each graph can depend on a specific output class and feature. There can be multiple results. For each result the following fields are present.class
(OPTIONAL, string/int/float/bool). The output class name that the graph corresponds to. This field is not always present.feature
(REQUIRED, string). The feature the graph corresponds to. This is required, since all relevant graphs are dependent on features.data
(list)x_value
(REQUIRED, float). The $x$-value of the graph.y_value
(REQUIRED, float). The $y$-value of the graph.assessment_card
","text":"An assessment_card
contains the following information.
name
(REQUIRED, string). The name of the assessment.date
(REQUIRED, string). The date at which the assessment is completed. Date should be given in ISO 8601 format, i.e. YYYY-MM-DD
.contents
(list). There can be multiple items in contents. For each item the following fields are present:
question
(REQUIRED, string). A question.answer
(REQUIRED, string). An answer.remarks
(OPTIONAL, string). A field to put relevant discussion remarks in.authors
. There can be multiple names. For each name the following field is present.name
(OPTIONAL, string). The name of the author of the question.timestamp
(OPTIONAL, string). A timestamp of the date, time and timezone of the answer. Timestamp should be given, preferably in UTC (represented as Z
), in ISO 8601 format, i.e. 2024-04-16T16:48:14Z
.version: {system_card_version} # Optional. Example: \"0.1a1\"\nname: {system_name} # Optional. Example: \"AangifteVertrekBuitenland\"\nupl: {upl_uri} # Optional. Example: https://standaarden.overheid.nl/owms/terms/AangifteVertrekBuitenland\nowners:\n- oin: {oin} # Optional. Example: 00000001003214345000\n organization: {organization_name} # Optional if oin is provided, Required otherwise. Example: BZK\n name: {owner_name} # Optional. Example: John Doe\n email: {owner_email} # Optional. Example: johndoe@email.com\n role: {owner_role} # Optional. Example: Data Scientist.\ndescription: {system_description} # Optional. Short description of the system.\nlabels: # Optional. Labels to store metadata about the system.\n- name: {label_name} # Optional.\n value: {label_value} # Optional.\nstatus: {system_status} # Optional. Example: \"production\".\npublication_category: {system_publication_cat} # Optional. Example: \"impactful_algorithm\".\nbegin_date: {system_begin_date} # Optional. Example: 2025-1-1.\nend_date: {system_end_date} # Optional. Example: 2025-12-1.\ngoal_and_impact: {system_goal_and_impact} # Optional. Goal and impact of the system.\nconsiderations: {system_considerations} # Optional. Considerations about the system.\nrisk_management: {system_risk_management} # Optional. Description of risks associated with the system.\nhuman_intervention: {system_human_intervention} # Optional. Description of human involvement in the system.\nlegal_base:\n- name: {law_name} # Optional. Example: \"AVG\".\n link: {law_uri} # Optional. Example: \"https://eur-lex.europa.eu/legal-content/NL/TXT/HTML/?uri=CELEX:31995L0046\".\nused_data: {system_used_data} # Optional. Description of the data used by the system.\ntechnical_design: {technical_design} # Optional. Description of the technical design of the system.\nexternal_providers:\n- {system_external_provider} # Optional. Reference to used external providers.\nreferences:\n- {reference_uri} # Optional. Example: URI to codebase.\n\nmodels:\n- !include {model_card_uri} # Optional. Example: cat_classifier_model.yaml.\n\nassessments:\n- !include {assessment_card_uri} # Required. Example: iama.yaml.\n
"},{"location":"projects/tad/reporting-standard/0.1a3/#model-card","title":"Model Card","text":"language:\n - {lang_0} # Optional. Example nl.\nlicense: {license} # Required. Example: Apache-2.0 or any license SPDX ID from https://opensource.org/license or \"other\".\nlicense_name: {license_name} # Optional if license != other, Required otherwise. Example: 'my-license-1.0'\nlicense_link: {license_link} # Optional if license != other, Required otherwise. Specify \"LICENSE\" or \"LICENSE.md\" to link to a file of that name inside the repo, or a URL to a remote file.\ntags:\n- {tag_0} # Optional. Example: audio\n- {tag_1} # Optional. Example: automatic-speech-recognition\nowners:\n- organization: {organization_name} # Required. Example: BZK\n oin: {oin} # Optional. Example: 00000001003214345000\n name: {owner_name} # Optional. Example: John Doe\n email: {owner_email} # Optional. Example: johndoe@email.com\n role: {owner_role} # Optional. Example: Data Scientist.\n\nmodel-index:\n- name: {model_id} # Required. Example: CatClassifier.\n model: {model_uri} # Required. URI to a repository containing the model file.\n artifacts:\n - uri: {model_artifact_uri} # Optional. Example: \"https://github.com/MinBZK/poc-kijkdoos-wasm-models/raw/main/logres_iris/logreg_iris.onnx\"\n - content-type: {model_artifact_type} # Optional. Example: \"application/onnx\".\n - md5-checksum: {md5_checksum} # Optional. Example: \"120EA8A25E5D487BF68B5F7096440019\"\n parameters:\n - name: {parameter_name} # Optional. Example: \"epochs\".\n dtype: {parameter_dtype} # Optional. Example: \"int\".\n value: {parameter_value} # Optional. Example: 100.\n labels:\n - name: {label_name} # Optional. Example: \"gender\".\n dtype: {label_type} # Optional. Example: \"string\".\n value: {label_value} # Optional. Example: \"female\".\n results:\n - task:\n type: {task_type} # Required. Example: image-classification.\n name: {task_name} # Optional. Example: Image Classification.\n datasets:\n - type: {dataset_type} # Required. Example: common_voice. Link to a repository containing the dataset\n name: {dataset_name} # Required. Example: \"Common Voice (French)\". A pretty name for the dataset.\n split: {split} # Optional. Example: \"train\".\n features:\n - {feature_name} # Optional. Example: \"gender\".\n revision: {dataset_version} # Optional. Example: 5503434ddd753f426f4b38109466949a1217c2bb\n metrics:\n - type: {metric_type} # Required. Example: false-positive-rate. Use metric id from https://hf.co/metrics.\n name: {metric_name} # Required. Example: \"FPR wrt class 0 restricted to feature gender:0 and age:21\".\n dtype: {metric_dtype} # Required. Example: \"float\".\n value: {metric_value} # Required. Example: 0.75.\n labels:\n - name: {label_name} # Optional. Example: \"gender\".\n type: {label_type} # Optional. Example: \"feature\".\n dtype: {label_type} # Optional. Example: \"string\".\n value: {label_value} # Optional. Example: \"female\".\n measurements:\n # Bar plots should be able to capture SHAP and Robustness Toolbox from AI Verify.\n bar_plots:\n - type: {measurement_type} # Required. Example: \"SHAP\".\n name: {measurement_name} # Optional. Example: \"Mean Absolute Shap Values\".\n results:\n - name: {bar_name} # Required. The name of a bar.\n value: {bar_value} # Required. The corresponding value.\n # Graph plots should be able to capture graph based measurements such as partial dependence and accumulated local effect.\n graph_plots:\n - type: {measurement_type} # Required. Example: \"partial_dependence\".\n name: {measurement_name} # Optional. Example: \"Partial Dependence Plot\".\n # Results store the graph plot data. So far all plots are dependent on a combination of a specific class (sometimes) and feature (always).\n # For example partial dependence plots are made for each feature and class.\n results:\n - class: {class_name} # Optional. Name of the output class the graph depends on.\n feature: {feature_name} # Required. Name of the feature the graph depends on.\n data:\n - x_value: {x_value} # Required. The x value of the graph data.\n y_value: {y_value} # Required. The y value of the graph data.\n
"},{"location":"projects/tad/reporting-standard/0.1a3/#assessment-card","title":"Assessment Card","text":"name: {assessment_name} # Required. Example: IAMA.\ndate: {assessment_date} # Required. Example: 25-03-2025.\ncontents:\n - question: {question_text} # Required. Example: \"Question 1: ...\".\n answer: {answer_text} # Required. Example: \"Answer: ...\".\n remarks: {remarks_text} # Optional. Example: \"Remarks: ...\".\n authors: # Optional. Example: \"['John', 'Peter']\".\n - name: {author_name}\n timestamp: {timestamp} # Optional. Example: 2024-04-16T16:48:14Z.\n
"},{"location":"projects/tad/reporting-standard/0.1a3/#schema","title":"Schema","text":"JSON schema will be added when we publish the first beta version.
"},{"location":"projects/tad/reporting-standard/0.1a3/#changelog","title":"Changelog","text":"Deviation from the Hugging Face specification is in the License field. Hugging Face only accepts dataset id's from Hugging Face license list while we accept any license from Open Source License List.\u00a0\u21a9\u21a9
Deviation from the Hugging Face specification is in the model_index:results:dataset
field. Hugging Face only accepts one dataset, while we accept a list of datasets.\u00a0\u21a9\u21a9
Deviation from the Hugging Face specification is in the Dataset Type field. Hugging Face only accepts dataset id's from Hugging Face datasets while we also allow for any url pointing to the dataset.\u00a0\u21a9\u21a9
For this extension to work relevant metrics (such as for example false positive rate) have to be added to the Hugging Face metrics, possibly this can be done in our organizational namespace.\u00a0\u21a9\u21a9
This document describes the Transparency of Algorithmic Decision making (TAD) Reporting Standard.
For reproducibility, governance, auditing and sharing of algorithmic systems it is essential to have a reporting standard so that information about an algorithmic system can be shared. This reporting standard describes how information about the different phases of an algorithm's life cycle can be reported. It contains, among other things, descriptive information combined with information about the technical tests and assessments applied.
Disclaimer
The TAD Reporting Standard is work in progress. This means that the current standard is probably suboptimal and will change significantly in future versions.
"},{"location":"projects/tad/reporting-standard/0.1a4/#introduction","title":"Introduction","text":"Inspired by Model Cards for Model Reporting and Papers with Code Model Index this standard almost 1 2 3 4 extends the Hugging Face model card metadata specification to allow for:
metrics_field
from the Hugging Face metadata specification.measurements
.assessments
.Following Hugging Face, this proposed standard will be written in yaml.
This standard does not contain all fields present in the Hugging Face metadata specification. The fields that are optional in the Hugging Face specification and are specific to the Hugging Face interface are omitted.
Another difference is that we divide our implementation into three separate parts.
system_card
, containing information about a group of ML-models which accomplish a specific task.model_card
, containing information about a specific data science model.assessment_card
, containing information about a regulatory assessment.Include statements
These model_card
s and assessment_card
s can be included verbatim into a system_card
, or referenced with an !include
statement, allowing for minimal cards to be compact in a single file. Extensive cards can be split up for readability and maintainability. Our standard allows for the !include
to be used anywhere.
The standard will be written in yaml. Example yaml files are given in the next section. The standard defines three cards: a system_card
, a model_card
and an assessment_card
. A system_card
contains information about an algorithmic system. It can have multiple models and each of these models should have a model_card
. Regulatory assessments can be processed in an assessment_card
. Note that model_card
's and assessment_card
's can be included directly into the system_card
or can be included as separate yaml files with help of a yaml-include mechanism. For clarity the latter is preferred and is also used in the examples in the next section.
system_card
","text":"A system_card
contains the following information.
schema_version
(REQUIRED, string). Version of the schema used, for example \"0.1a2\".provenance
(OPTIONAL). In case this System Card is generated from another source file, this field can capture the historical context of the contents of this System Card.
git_commit_hash
(OPTIONAL, string). Git commit hash of the commit which contains the transformation file used to create this card.timestamp
(OPTIONAL, string). A timestamp of the date, time and timezone of generation of this System Card. Timestamp should be given, preferably in UTC (represented as Z
), in ISO 8601 format, i.e. 2024-04-16T16:48:14Z
.uri
(OPTIONAL, string). URI to the tool that was used to perform the transformations.author
(OPTIONAL, string). Name of person that initiated the transformations.name
(OPTIONAL, string). Name used to describe the system.
upl
(OPTIONAL, string). If this algorithm is part of a product offered by the Dutch Government, it should contain a URI from the Uniform Product List.owners
(list). There can be multiple owners. For each owner the following fields are present.
oin
(OPTIONAL, string). If applicable the Organisatie-identificatienummer (OIN).organization
(OPTIONAL, string). Name of the organization that owns the model. If ion
is NOT provided this field is REQUIRED.name
(OPTIONAL, string). Name of a contact person within the organization.email
(OPTIONAL, string). Email address of the contact person or organization.role
(OPTIONAL, string). Role of the contact person. This field should only be set when the name
field is set.description
(OPTIONAL, string). A short description of the system.
labels
(OPTIONAL, list). This fields allows to store meta information about a system. There can be multiple labels. For each label the following fields are present.
name
(OPTIONAL, string). Name of the label.value
(OPTIONAL, string). Value of the label.status
(OPTIONAL, string). The status of the system. For example the status can be \"production\".
publication_category
(OPTIONAL, enum[string]). The publication category of the algorithm should be chosen from [\"high_risk\", other\"]
.begin_date
(OPTIONAL, string). The first date the system was used. Date should be given in ISO 8601 format, i.e. YYYY-MM-DD
.end_date
(OPTIONAL, string). The last date the system was used. Date should be given in ISO 8601 format, i.e. YYYY-MM-DD
.goal_and_impact
(OPTIONAL, string). The purpose of the system and the impact it has on citizens and companies.considerations
(OPTIONAL, string). The pro's and con's of using the system.risk_management
(OPTIONAL, string). Description of the risks associated with the system.human_intervention
(OPTIONAL, string). A description to want extend there is human involvement in the system.legal_base
(OPTIONAL, list). If there exists a legal base for the process the system is embedded in, this field can be filled in with the relevant laws. There can be multiple legal bases. For each legal base the following fields are present.name
(OPTIONAL, string). Name of the law.link
(OPTIONAL, string). URI pointing towards the contents of the law.used_data
(OPTIONAL, string). An overview of the data that is used in the system.technical_design
(OPTIONAL, string). Description on how the system works.external_providers
(OPTIONAL, list[string]). Name of an external provider, if relevant. There can be multiple external providers.references
(OPTIONAL, list[string]). Additional reference URI's that point information about the system and are relevant.models
(OPTIONAL, list[ModelCard]). A list of model cards (as defined below) or !include
s of a yaml file containing a model card. This model card can for example be a model card described in the next section or a model card from Hugging Face. There can be multiple model cards, meaning multiple models are used.assessments
(OPTIONAL, list[AssessmentCard]). A list of assessment cards (as defined below) or !include
s of a yaml file containing a assessment card. This assessment card is an assessment card described in the next section. There can be multiple assessment cards, meaning multiple assessment were performed.model_card
","text":"A model_card
contains the following information.
provenance
(OPTIONAL). In case this Model Card is generated from another source file, this field can capture the historical context of the contents of this Model Card.
git_commit_hash
(OPTIONAL, string). Git commit hash of the commit which contains the transformation file used to create this card.timestamp
(OPTIONAL, string). A timestamp of the date, time and timezone of generation of this System Card. Timestamp should be given, preferably in UTC (represented as Z
), in ISO 8601 format, i.e. 2024-04-16T16:48:14Z
.uri
(OPTIONAL, string). URI to the tool that was used to perform the transformations.author
(OPTIONAL, string). Name of person that initiated the transformations.language
(OPTIONAL, list[string]). If relevant, the natural languages the model supports in ISO 639. There can be multiple languages.
license
(REQUIRED, string). Any license from the open source license list 1. If the license is NOT present in the license list this field must be set to 'other' and the following two fields will be REQUIRED.
license_name
(string). An id for the license.license_link
(string). A link to a file of that name inside the repo, or a URL to a remote file containing the license contents.tags
(OPTIONAL, list[string]). Tags with keywords to describe the project. There can be multiple tags.
owners
(list). There can be multiple owners. For each owner the following fields are present.
oin
(OPTIONAL, string). If applicable the Organisatie-identificatienummer (OIN).organization
(OPTIONAL, string). Name of the organization that owns the model. If ion
is NOT provided this field is REQUIRED.name
(OPTIONAL, string). Name of a contact person within the organization.email
(OPTIONAL, string). Email address of the contact person or organization.role
(OPTIONAL, string). Role of the contact person. This field should only be set when the name
field is set.There can be multiple models. For each model the following fields are present.
name
(REQUIRED, string). The name of the model.model
(REQUIRED, string). A URI pointing to a repository containing the model file.artifacts
(OPTIONAL, list). A list of artifacts
uri
(OPTIONAL, string) URI refers to a relevant model artifactcontent-type
(OPTIONAL, string) Optional type, follow the Content-Type. Recognized values are \"application/onnx\"\", to refer to an ONNX representation of the model.md5-checksum
(OPTIONAL, string) Optional checksum for the content of the file.parameters
(list). There can be multiple parameters. For each parameter the following fields are present.
name
(REQUIRED, string). The name of the parameter, for example \"epochs\".dtype
(OPTIONAL, string). The datatype of the parameter, for example \"int\".value
(OPTIONAL, string). The value of the parameter, for example 100.labels
(list). This field allows to store meta information about a parameter. There can be multiple labels. For each label the following fields are present.
name
(OPTIONAL, string). The name of the label.dtype
(OPTIONAL, string). The datatype of the feature. If name
is set, this field is REQUIRED.value
(OPTIONAL, string). The value of the feature. If name
is set, this field is REQUIRED.results
(list). There can be multiple results. For each result the following fields are present.
task
(OPTIONAL, list).
task_type
(REQUIRED, string). The task of the model, for example \"object-classification\".task_name
(OPTIONAL, string). A pretty name for the model tasks, for example \"Object Classification\".datasets
(list). There can be multiple datasets 2. For each dataset the following fields are present.
type
(REQUIRED, string). The type of the dataset, can be a dataset id from Hugging Face datasets or any other link to a repository containing the dataset3, for example \"common_voice\".name
(REQUIRED, string). Name pretty name for the dataset, for example \"Common Voice (French)\".split
(OPTIONAL, string). The split of the dataset, for example \"train\".features
(OPTIONAL, list[string]). List of feature names.revision
(OPTIONAL, string). Version of the dataset, for example 5503434ddd753f426f4b38109466949a1217c2bb.metrics
(list). There can be multiple metrics. For each metric the following fields are present.
type
(REQUIRED, string). A metric-id from Hugging Face metrics4, for example accuracy.name
(REQUIRED, string). A descriptive name of the metric. For example \"false positive rate\" is not a descriptive name, but \"training false positive rate w.r.t class x\" is.dtype
(REQUIRED, string). The data type of the metric, for example float
.value
(REQUIRED, string). The value of the metric.labels
(list). This field allows to store meta information about a metric. For example, metrics can be computed for example on subgroups of specific features. For example, one can compute the accuracy for examples where the feature \"gender\" is set to \"male\". There can be multiple subgroups, which means that the metric is computed on the intersection of those subgroups. There can be multiple labels. For each label the following fields are present.
name
(OPTIONAL, string). The name of the feature. For example: \"gender\".type
(OPTIONAL, string). The type of the label. Can for example be set to \"feature\" or \"output_class\". If name
is set, this field is REQUIRED.dtype
(OPTIONAL, string). The datatype of the feature, for example float
. If name
is set, this field is REQUIRED.value
(OPTIONAL, string). The value of the feature. If name
is set, this field is REQUIRED. For example: \"male\".measurements
.
bar_plots
(list). The purpose of this field is to capture bar plot like measurements, for example SHAP values. There can be multiple bar plots. For each bar plot the following fields are present.
type
(REQUIRED, string). The type of bar plot, for example \"SHAP\".name
(OPTIONAL, string). A pretty name for the plot, for example \"Mean Absolute SHAP Values\".results
(list). The contents of the bar plot. A result represents a bar. There can be multiple results. For each result the following fields are present.name
(REQUIRED, string). The name of bar.value
(REQUIRED, float). The value of the corresponding bar.graph_plots
(list). The purpose of this field is to capture graph plot like measurements, such as partial dependence plots. There can be multiple graph plots. For each graph plot the following fields are present.
type
(REQUIRED, string). The type of the graph plot, for example \"partial_dependence\".name
(OPTIONAL, string). A pretty name of the graph, for example \"Partial Dependence Plot\".results
(list). Results contains the graph plot data. Each graph can depend on a specific output class and feature. There can be multiple results. For each result the following fields are present.class
(OPTIONAL, string/int/float/bool). The output class name that the graph corresponds to. This field is not always present.feature
(REQUIRED, string). The feature the graph corresponds to. This is required, since all relevant graphs are dependent on features.data
(list)x_value
(REQUIRED, float). The $x$-value of the graph.y_value
(REQUIRED, float). The $y$-value of the graph.assessment_card
","text":"An assessment_card
contains the following information.
provenance
(OPTIONAL). In case this Assessment Card is generated from another source file, this field can capture the historical context of the contents of this Assessment Card.
git_commit_hash
(OPTIONAL, string). Git commit hash of the commit which contains the transformation file used to create this card.timestamp
(OPTIONAL, string). A timestamp of the date, time and timezone of generation of this System Card. Timestamp should be given, preferably in UTC (represented as Z
), in ISO 8601 format, i.e. 2024-04-16T16:48:14Z
.uri
(OPTIONAL, string). URI to the tool that was used to perform the transformations.author
(OPTIONAL, string). Name of person that initiated the transformations.name
(REQUIRED, string). The name of the assessment.
date
(REQUIRED, string). The date at which the assessment is completed. Date should be given in ISO 8601 format, i.e. YYYY-MM-DD
.contents
(list). There can be multiple items in contents. For each item the following fields are present:
question
(REQUIRED, string). A question.answer
(REQUIRED, string). An answer.remarks
(OPTIONAL, string). A field to put relevant discussion remarks in.authors
. There can be multiple names. For each name the following field is present.name
(OPTIONAL, string). The name of the author of the question.timestamp
(OPTIONAL, string). A timestamp of the date, time and timezone of the answer. Timestamp should be given, preferably in UTC (represented as Z
), in ISO 8601 format, i.e. 2024-04-16T16:48:14Z
.version: {system_card_version} # Optional. Example: \"0.1a1\"\nprovenance: # Optional.\n git_commit_hash: {git_commit_hash} # Optional. Example: 5503434ddd753f426f4b38109466949a1217c2bb\n timestamp: {modification_timestamp} # Optional. Example: 2024-04-16T16:48:14Z.\n uri: {modification_uri} # Optional. Example: https://github.com/MinBZK/tad-conversion-tool\n author: {modification_author} # Optional. Example: John Doe\nname: {system_name} # Optional. Example: \"AangifteVertrekBuitenland\"\nupl: {upl_uri} # Optional. Example: https://standaarden.overheid.nl/owms/terms/AangifteVertrekBuitenland\nowners:\n- oin: {oin} # Optional. Example: 00000001003214345000\n organization: {organization_name} # Optional if oin is provided, Required otherwise. Example: BZK\n name: {owner_name} # Optional. Example: John Doe\n email: {owner_email} # Optional. Example: johndoe@email.com\n role: {owner_role} # Optional. Example: Data Scientist.\ndescription: {system_description} # Optional. Short description of the system.\nlabels: # Optional. Labels to store metadata about the system.\n- name: {label_name} # Optional.\n value: {label_value} # Optional.\nstatus: {system_status} # Optional. Example: \"production\".\npublication_category: {system_publication_cat} # Optional. Example: \"impactful_algorithm\".\nbegin_date: {system_begin_date} # Optional. Example: 2025-1-1.\nend_date: {system_end_date} # Optional. Example: 2025-12-1.\ngoal_and_impact: {system_goal_and_impact} # Optional. Goal and impact of the system.\nconsiderations: {system_considerations} # Optional. Considerations about the system.\nrisk_management: {system_risk_management} # Optional. Description of risks associated with the system.\nhuman_intervention: {system_human_intervention} # Optional. Description of human involvement in the system.\nlegal_base:\n- name: {law_name} # Optional. Example: \"AVG\".\n link: {law_uri} # Optional. Example: \"https://eur-lex.europa.eu/legal-content/NL/TXT/HTML/?uri=CELEX:31995L0046\".\nused_data: {system_used_data} # Optional. Description of the data used by the system.\ntechnical_design: {technical_design} # Optional. Description of the technical design of the system.\nexternal_providers:\n- {system_external_provider} # Optional. Reference to used external providers.\nreferences:\n- {reference_uri} # Optional. Example: URI to codebase.\n\nmodels:\n- !include {model_card_uri} # Optional. Example: cat_classifier_model.yaml.\n\nassessments:\n- !include {assessment_card_uri} # Required. Example: iama.yaml.\n
"},{"location":"projects/tad/reporting-standard/0.1a4/#model-card","title":"Model Card","text":"provenance: # Optional.\n git_commit_hash: {git_commit_hash} # Optional. Example: 5503434ddd753f426f4b38109466949a1217c2bb\n timestamp: {modification_timestamp} # Optional. Example: 2024-04-16T16:48:14Z.\n uri: {modification_uri} # Optional. Example: https://github.com/MinBZK/tad-conversion-tool\n author: {modification_author} # Optional. Example: John Doe\nlanguage:\n - {lang_0} # Optional. Example nl.\nlicense: {license} # Required. Example: Apache-2.0 or any license SPDX ID from https://opensource.org/license or \"other\".\nlicense_name: {license_name} # Optional if license != other, Required otherwise. Example: 'my-license-1.0'\nlicense_link: {license_link} # Optional if license != other, Required otherwise. Specify \"LICENSE\" or \"LICENSE.md\" to link to a file of that name inside the repo, or a URL to a remote file.\ntags:\n- {tag_0} # Optional. Example: audio\n- {tag_1} # Optional. Example: automatic-speech-recognition\nowners:\n- organization: {organization_name} # Required. Example: BZK\n oin: {oin} # Optional. Example: 00000001003214345000\n name: {owner_name} # Optional. Example: John Doe\n email: {owner_email} # Optional. Example: johndoe@email.com\n role: {owner_role} # Optional. Example: Data Scientist.\n\nmodel-index:\n- name: {model_id} # Required. Example: CatClassifier.\n model: {model_uri} # Required. URI to a repository containing the model file.\n artifacts:\n - uri: {model_artifact_uri} # Optional. Example: \"https://github.com/MinBZK/poc-kijkdoos-wasm-models/raw/main/logres_iris/logreg_iris.onnx\"\n - content-type: {model_artifact_type} # Optional. Example: \"application/onnx\".\n - md5-checksum: {md5_checksum} # Optional. Example: \"120EA8A25E5D487BF68B5F7096440019\"\n parameters:\n - name: {parameter_name} # Optional. Example: \"epochs\".\n dtype: {parameter_dtype} # Optional. Example: \"int\".\n value: {parameter_value} # Optional. Example: 100.\n labels:\n - name: {label_name} # Optional. Example: \"gender\".\n dtype: {label_type} # Optional. Example: \"string\".\n value: {label_value} # Optional. Example: \"female\".\n results:\n - task:\n type: {task_type} # Required. Example: image-classification.\n name: {task_name} # Optional. Example: Image Classification.\n datasets:\n - type: {dataset_type} # Required. Example: common_voice. Link to a repository containing the dataset\n name: {dataset_name} # Required. Example: \"Common Voice (French)\". A pretty name for the dataset.\n split: {split} # Optional. Example: \"train\".\n features:\n - {feature_name} # Optional. Example: \"gender\".\n revision: {dataset_version} # Optional. Example: 5503434ddd753f426f4b38109466949a1217c2bb\n metrics:\n - type: {metric_type} # Required. Example: false-positive-rate. Use metric id from https://hf.co/metrics.\n name: {metric_name} # Required. Example: \"FPR wrt class 0 restricted to feature gender:0 and age:21\".\n dtype: {metric_dtype} # Required. Example: \"float\".\n value: {metric_value} # Required. Example: 0.75.\n labels:\n - name: {label_name} # Optional. Example: \"gender\".\n type: {label_type} # Optional. Example: \"feature\".\n dtype: {label_type} # Optional. Example: \"string\".\n value: {label_value} # Optional. Example: \"female\".\n measurements:\n # Bar plots should be able to capture SHAP and Robustness Toolbox from AI Verify.\n bar_plots:\n - type: {measurement_type} # Required. Example: \"SHAP\".\n name: {measurement_name} # Optional. Example: \"Mean Absolute Shap Values\".\n results:\n - name: {bar_name} # Required. The name of a bar.\n value: {bar_value} # Required. The corresponding value.\n # Graph plots should be able to capture graph based measurements such as partial dependence and accumulated local effect.\n graph_plots:\n - type: {measurement_type} # Required. Example: \"partial_dependence\".\n name: {measurement_name} # Optional. Example: \"Partial Dependence Plot\".\n # Results store the graph plot data. So far all plots are dependent on a combination of a specific class (sometimes) and feature (always).\n # For example partial dependence plots are made for each feature and class.\n results:\n - class: {class_name} # Optional. Name of the output class the graph depends on.\n feature: {feature_name} # Required. Name of the feature the graph depends on.\n data:\n - x_value: {x_value} # Required. The x value of the graph data.\n y_value: {y_value} # Required. The y value of the graph data.\n
"},{"location":"projects/tad/reporting-standard/0.1a4/#assessment-card","title":"Assessment Card","text":"provenance: # Optional.\n git_commit_hash: {git_commit_hash} # Optional. Example: 5503434ddd753f426f4b38109466949a1217c2bb\n timestamp: {modification_timestamp} # Optional. Example: 2024-04-16T16:48:14Z.\n uri: {modification_uri} # Optional. Example: https://github.com/MinBZK/tad-conversion-tool\n author: {modification_author} # Optional. Example: John Doe\nname: {assessment_name} # Required. Example: IAMA.\ndate: {assessment_date} # Required. Example: 25-03-2025.\ncontents:\n - question: {question_text} # Required. Example: \"Question 1: ...\".\n answer: {answer_text} # Required. Example: \"Answer: ...\".\n remarks: {remarks_text} # Optional. Example: \"Remarks: ...\".\n authors: # Optional. Example: \"['John', 'Peter']\".\n - name: {author_name}\n timestamp: {timestamp} # Optional. Example: 2024-04-16T16:48:14Z.\n
"},{"location":"projects/tad/reporting-standard/0.1a4/#schema","title":"Schema","text":"JSON schema will be added when we publish the first beta version.
"},{"location":"projects/tad/reporting-standard/0.1a4/#changelog","title":"Changelog","text":"Deviation from the Hugging Face specification is in the License field. Hugging Face only accepts dataset id's from Hugging Face license list while we accept any license from Open Source License List.\u00a0\u21a9\u21a9
Deviation from the Hugging Face specification is in the model_index:results:dataset
field. Hugging Face only accepts one dataset, while we accept a list of datasets.\u00a0\u21a9\u21a9
Deviation from the Hugging Face specification is in the Dataset Type field. Hugging Face only accepts dataset id's from Hugging Face datasets while we also allow for any url pointing to the dataset.\u00a0\u21a9\u21a9
For this extension to work relevant metrics (such as for example false positive rate) have to be added to the Hugging Face metrics, possibly this can be done in our organizational namespace.\u00a0\u21a9\u21a9
This document describes the Transparency of Algorithmic Decision making (TAD) Reporting Standard.
For reproducibility, governance, auditing and sharing of algorithmic systems it is essential to have a reporting standard so that information about an algorithmic system can be shared. This reporting standard describes how information about the different phases of an algorithm's life cycle can be reported. It contains, among other things, descriptive information combined with information about the technical tests and assessments applied.
Disclaimer
The TAD Reporting Standard is work in progress. This means that the current standard is probably suboptimal and will change significantly in future versions.
"},{"location":"projects/tad/reporting-standard/0.1a5/#introduction","title":"Introduction","text":"Inspired by Model Cards for Model Reporting and Papers with Code Model Index this standard almost 1 2 3 4 extends the Hugging Face model card metadata specification to allow for:
metrics_field
from the Hugging Face metadata specification.measurements
.assessments
.Following Hugging Face, this proposed standard will be written in yaml.
This standard does not contain all fields present in the Hugging Face metadata specification. The fields that are optional in the Hugging Face specification and are specific to the Hugging Face interface are omitted.
Another difference is that we divide our implementation into three separate parts.
system_card
, containing information about a group of ML-models which accomplish a specific task.model_card
, containing information about a specific data science model.assessment_card
, containing information about a regulatory assessment.Include statements
These model_card
s and assessment_card
s can be included verbatim into a system_card
, or referenced with an !include
statement, allowing for minimal cards to be compact in a single file. Extensive cards can be split up for readability and maintainability. Our standard allows for the !include
to be used anywhere.
The standard will be written in yaml. Example yaml files are given in the next section. The standard defines three cards: a system_card
, a model_card
and an assessment_card
. A system_card
contains information about an algorithmic system. It can have multiple models and each of these models should have a model_card
. Regulatory assessments can be processed in an assessment_card
. Note that model_card
's and assessment_card
's can be included directly into the system_card
or can be included as separate yaml files with help of a yaml-include mechanism. For clarity the latter is preferred and is also used in the examples in the next section.
system_card
","text":"A system_card
contains the following information.
schema_version
(REQUIRED, string). Version of the schema used, for example \"0.1a2\".provenance
(OPTIONAL). In case this System Card is generated from another source file, this field can capture the historical context of the contents of this System Card.
git_commit_hash
(OPTIONAL, string). Git commit hash of the commit which contains the transformation file used to create this card.timestamp
(OPTIONAL, string). A timestamp of the date, time and timezone of generation of this System Card. Timestamp should be given, preferably in UTC (represented as Z
), in ISO 8601 format, i.e. 2024-04-16T16:48:14Z
.uri
(OPTIONAL, string). URI to the tool that was used to perform the transformations.author
(OPTIONAL, string). Name of person that initiated the transformations.name
(OPTIONAL, string). Name used to describe the system.
upl
(OPTIONAL, string). If this algorithm is part of a product offered by the Dutch Government, it should contain a URI from the Uniform Product List.owners
(list). There can be multiple owners. For each owner the following fields are present.
oin
(OPTIONAL, string). If applicable the Organisatie-identificatienummer (OIN).organization
(OPTIONAL, string). Name of the organization that owns the model. If ion
is NOT provided this field is REQUIRED.name
(OPTIONAL, string). Name of a contact person within the organization.email
(OPTIONAL, string). Email address of the contact person or organization.role
(OPTIONAL, string). Role of the contact person. This field should only be set when the name
field is set.description
(OPTIONAL, string). A short description of the system.
labels
(OPTIONAL, list). This fields allows to store meta information about a system. There can be multiple labels. For each label the following fields are present.
name
(OPTIONAL, string). Name of the label.value
(OPTIONAL, string). Value of the label.status
(OPTIONAL, string). The status of the system. For example the status can be \"production\".
publication_category
(OPTIONAL, enum[string]). The publication category of the algorithm should be chosen from [\"high_risk\", other\"]
.begin_date
(OPTIONAL, string). The first date the system was used. Date should be given in ISO 8601 format, i.e. YYYY-MM-DD
.end_date
(OPTIONAL, string). The last date the system was used. Date should be given in ISO 8601 format, i.e. YYYY-MM-DD
.goal_and_impact
(OPTIONAL, string). The purpose of the system and the impact it has on citizens and companies.considerations
(OPTIONAL, string). The pro's and con's of using the system.risk_management
(OPTIONAL, string). Description of the risks associated with the system.human_intervention
(OPTIONAL, string). A description to want extend there is human involvement in the system.legal_base
(OPTIONAL, list). If there exists a legal base for the process the system is embedded in, this field can be filled in with the relevant laws. There can be multiple legal bases. For each legal base the following fields are present.name
(OPTIONAL, string). Name of the law.link
(OPTIONAL, string). URI pointing towards the contents of the law.used_data
(OPTIONAL, string). An overview of the data that is used in the system.technical_design
(OPTIONAL, string). Description on how the system works.external_providers
(OPTIONAL, list). If relevant, these fields allow to store information on external providers. There can be multiple external providers.name
(OPTIONAL, string). Name of the external provider.version
(OPTIONAL, string). Version of the external provider reflecting its relation to previous versions.references
(OPTIONAL, list[string]). Additional reference URI's that point information about the system and are relevant.interaction_details
(OPTIONAL, list[string]). Explain how the AI system interacts with hardware or software, including other AI systems, or how the AI system can be used to interact with hardware or software.version_requirements
(OPTIONAL, list[string]). Describe the versions of the relevant software or firmware, and any requirements related to version updates.deployment_variants
(OPTIONAL, list[string]). Description of all the forms in which the AI system is placed on the market or put into service, such as software packages embedded into hardware, downloads, or APIs.hardware_requirements
(OPTIONAL, list[string]). Provide a description of the hardware on which the AI system must be run.product_markings
(OPTIONAL, list[string]). If the AI system is a component of products, photos, or illustrations, describe the external features, markings, and internal layout of those products.user_interface
(OPTIONAL, list). Provide information on the user interface provided to the user responsible for its operation.description
(OPTIONAL, string). A description of the provided user interface.link
(OPTIONAL, string). A link to the user interface can be included.snapshot
(OPTIONAL, string). A snapshot/screenshot of the user interface can be included with the use of a hyperlink.models
(OPTIONAL, list[ModelCard]). A list of model cards (as defined below) or !include
s of a yaml file containing a model card. This model card can for example be a model card described in the next section or a model card from Hugging Face. There can be multiple model cards, meaning multiple models are used.assessments
(OPTIONAL, list[AssessmentCard]). A list of assessment cards (as defined below) or !include
s of a yaml file containing a assessment card. This assessment card is an assessment card described in the next section. There can be multiple assessment cards, meaning multiple assessment were performed.model_card
","text":"A model_card
contains the following information.
provenance
(OPTIONAL). In case this Model Card is generated from another source file, this field can capture the historical context of the contents of this Model Card.
git_commit_hash
(OPTIONAL, string). Git commit hash of the commit which contains the transformation file used to create this card.timestamp
(OPTIONAL, string). A timestamp of the date, time and timezone of generation of this System Card. Timestamp should be given, preferably in UTC (represented as Z
), in ISO 8601 format, i.e. 2024-04-16T16:48:14Z
.uri
(OPTIONAL, string). URI to the tool that was used to perform the transformations.author
(OPTIONAL, string). Name of person that initiated the transformations.language
(OPTIONAL, list[string]). If relevant, the natural languages the model supports in ISO 639. There can be multiple languages.
license
(REQUIRED, string). Any license from the open source license list 1. If the license is NOT present in the license list this field must be set to 'other' and the following two fields will be REQUIRED.
license_name
(string). An id for the license.license_link
(string). A link to a file of that name inside the repo, or a URL to a remote file containing the license contents.tags
(OPTIONAL, list[string]). Tags with keywords to describe the project. There can be multiple tags.
owners
(list). There can be multiple owners. For each owner the following fields are present.
oin
(OPTIONAL, string). If applicable the Organisatie-identificatienummer (OIN).organization
(OPTIONAL, string). Name of the organization that owns the model. If ion
is NOT provided this field is REQUIRED.name
(OPTIONAL, string). Name of a contact person within the organization.email
(OPTIONAL, string). Email address of the contact person or organization.role
(OPTIONAL, string). Role of the contact person. This field should only be set when the name
field is set.There can be multiple models. For each model the following fields are present.
name
(REQUIRED, string). The name of the model.model
(REQUIRED, string). A URI pointing to a repository containing the model file.artifacts
(OPTIONAL, list). A list of artifacts
uri
(OPTIONAL, string) URI refers to a relevant model artifactcontent-type
(OPTIONAL, string) Optional type, follow the Content-Type. Recognized values are \"application/onnx\"\", to refer to an ONNX representation of the model.md5-checksum
(OPTIONAL, string) Optional checksum for the content of the file.parameters
(list). There can be multiple parameters. For each parameter the following fields are present.
name
(REQUIRED, string). The name of the parameter, for example \"epochs\".dtype
(OPTIONAL, string). The datatype of the parameter, for example \"int\".value
(OPTIONAL, string). The value of the parameter, for example 100.labels
(list). This field allows to store meta information about a parameter. There can be multiple labels. For each label the following fields are present.
name
(OPTIONAL, string). The name of the label.dtype
(OPTIONAL, string). The datatype of the feature. If name
is set, this field is REQUIRED.value
(OPTIONAL, string). The value of the feature. If name
is set, this field is REQUIRED.results
(list). There can be multiple results. For each result the following fields are present.
task
(OPTIONAL, list).
task_type
(REQUIRED, string). The task of the model, for example \"object-classification\".task_name
(OPTIONAL, string). A pretty name for the model tasks, for example \"Object Classification\".datasets
(list). There can be multiple datasets 2. For each dataset the following fields are present.
type
(REQUIRED, string). The type of the dataset, can be a dataset id from Hugging Face datasets or any other link to a repository containing the dataset3, for example \"common_voice\".name
(REQUIRED, string). Name pretty name for the dataset, for example \"Common Voice (French)\".split
(OPTIONAL, string). The split of the dataset, for example \"train\".features
(OPTIONAL, list[string]). List of feature names.revision
(OPTIONAL, string). Version of the dataset, for example 5503434ddd753f426f4b38109466949a1217c2bb.metrics
(list). There can be multiple metrics. For each metric the following fields are present.
type
(REQUIRED, string). A metric-id from Hugging Face metrics4, for example accuracy.name
(REQUIRED, string). A descriptive name of the metric. For example \"false positive rate\" is not a descriptive name, but \"training false positive rate w.r.t class x\" is.dtype
(REQUIRED, string). The data type of the metric, for example float
.value
(REQUIRED, string). The value of the metric.labels
(list). This field allows to store meta information about a metric. For example, metrics can be computed for example on subgroups of specific features. For example, one can compute the accuracy for examples where the feature \"gender\" is set to \"male\". There can be multiple subgroups, which means that the metric is computed on the intersection of those subgroups. There can be multiple labels. For each label the following fields are present.
name
(OPTIONAL, string). The name of the feature. For example: \"gender\".type
(OPTIONAL, string). The type of the label. Can for example be set to \"feature\" or \"output_class\". If name
is set, this field is REQUIRED.dtype
(OPTIONAL, string). The datatype of the feature, for example float
. If name
is set, this field is REQUIRED.value
(OPTIONAL, string). The value of the feature. If name
is set, this field is REQUIRED. For example: \"male\".measurements
.
bar_plots
(list). The purpose of this field is to capture bar plot like measurements, for example SHAP values. There can be multiple bar plots. For each bar plot the following fields are present.
type
(REQUIRED, string). The type of bar plot, for example \"SHAP\".name
(OPTIONAL, string). A pretty name for the plot, for example \"Mean Absolute SHAP Values\".results
(list). The contents of the bar plot. A result represents a bar. There can be multiple results. For each result the following fields are present.name
(REQUIRED, string). The name of bar.value
(REQUIRED, float). The value of the corresponding bar.graph_plots
(list). The purpose of this field is to capture graph plot like measurements, such as partial dependence plots. There can be multiple graph plots. For each graph plot the following fields are present.
type
(REQUIRED, string). The type of the graph plot, for example \"partial_dependence\".name
(OPTIONAL, string). A pretty name of the graph, for example \"Partial Dependence Plot\".results
(list). Results contains the graph plot data. Each graph can depend on a specific output class and feature. There can be multiple results. For each result the following fields are present.class
(OPTIONAL, string/int/float/bool). The output class name that the graph corresponds to. This field is not always present.feature
(REQUIRED, string). The feature the graph corresponds to. This is required, since all relevant graphs are dependent on features.data
(list)x_value
(REQUIRED, float). The $x$-value of the graph.y_value
(REQUIRED, float). The $y$-value of the graph.assessment_card
","text":"An assessment_card
contains the following information.
provenance
(OPTIONAL). In case this Assessment Card is generated from another source file, this field can capture the historical context of the contents of this Assessment Card.
git_commit_hash
(OPTIONAL, string). Git commit hash of the commit which contains the transformation file used to create this card.timestamp
(OPTIONAL, string). A timestamp of the date, time and timezone of generation of this System Card. Timestamp should be given, preferably in UTC (represented as Z
), in ISO 8601 format, i.e. 2024-04-16T16:48:14Z
.uri
(OPTIONAL, string). URI to the tool that was used to perform the transformations.author
(OPTIONAL, string). Name of person that initiated the transformations.name
(REQUIRED, string). The name of the assessment.
date
(REQUIRED, string). The date at which the assessment is completed. Date should be given in ISO 8601 format, i.e. YYYY-MM-DD
.contents
(list). There can be multiple items in contents. For each item the following fields are present:
question
(REQUIRED, string). A question.answer
(REQUIRED, string). An answer.remarks
(OPTIONAL, string). A field to put relevant discussion remarks in.authors
. There can be multiple names. For each name the following field is present.name
(OPTIONAL, string). The name of the author of the question.timestamp
(OPTIONAL, string). A timestamp of the date, time and timezone of the answer. Timestamp should be given, preferably in UTC (represented as Z
), in ISO 8601 format, i.e. 2024-04-16T16:48:14Z
.version: {system_card_version} # Optional. Example: \"0.1a1\"\nprovenance: # Optional.\n git_commit_hash: {git_commit_hash} # Optional. Example: 5503434ddd753f426f4b38109466949a1217c2bb\n timestamp: {modification_timestamp} # Optional. Example: 2024-04-16T16:48:14Z.\n uri: {modification_uri} # Optional. Example: https://github.com/MinBZK/tad-conversion-tool\n author: {modification_author} # Optional. Example: John Doe\nname: {system_name} # Optional. Example: \"AangifteVertrekBuitenland\"\nupl: {upl_uri} # Optional. Example: https://standaarden.overheid.nl/owms/terms/AangifteVertrekBuitenland\nowners:\n- oin: {oin} # Optional. Example: 00000001003214345000\n organization: {organization_name} # Optional if oin is provided, Required otherwise. Example: BZK\n name: {owner_name} # Optional. Example: John Doe\n email: {owner_email} # Optional. Example: johndoe@email.com\n role: {owner_role} # Optional. Example: Data Scientist.\ndescription: {system_description} # Optional. Short description of the system.\nlabels: # Optional. Labels to store metadata about the system.\n- name: {label_name} # Optional.\n value: {label_value} # Optional.\nstatus: {system_status} # Optional. Example: \"production\".\npublication_category: {system_publication_cat} # Optional. Example: \"impactful_algorithm\".\nbegin_date: {system_begin_date} # Optional. Example: 2025-1-1.\nend_date: {system_end_date} # Optional. Example: 2025-12-1.\ngoal_and_impact: {system_goal_and_impact} # Optional. Goal and impact of the system.\nconsiderations: {system_considerations} # Optional. Considerations about the system.\nrisk_management: {system_risk_management} # Optional. Description of risks associated with the system.\nhuman_intervention: {system_human_intervention} # Optional. Description of human involvement in the system.\nlegal_base:\n- name: {law_name} # Optional. Example: \"AVG\".\n link: {law_uri} # Optional. Example: \"https://eur-lex.europa.eu/legal-content/NL/TXT/HTML/?uri=CELEX:31995L0046\".\nused_data: {system_used_data} # Optional. Description of the data used by the system.\ntechnical_design: {technical_design} # Optional. Description of the technical design of the system.\nexternal_providers:\n- name: {name_external_provider} # Optional. Reference to used external providers.\n version: {version_external_provider} # Optional. Version used of the external provider.\nreferences:\n- {reference_uri} # Optional. Example: URI to codebase.\ninteraction_details:\n- {system_interaction_details} # Optional. Example: \"GPS modules for location tracking\"\nversion_requirements:\n- {system_version_requirements} # Optional. Example: \">version2.1\"\ndeployment_variants:\n- {system_deployment_variants} # Optional. Example: \"Web Application\"\nhardware_requirements:\n- {system_hardware_requirements} # Optional. Example: \"8 cores, 16 threads CPU\"\nproduct_markings:\n- {system_product_markings} # Optional. Example: \"Model number in the info menu\"\nuser_interface:\n- description: {system_user_interface} # Optional. Example: \"web-based dashboard\"\n link: {system_user_interface_uri} # Optional. Example: \"http://example.com/content\"\n snapshot: {system_user_interface_snapshot_uri} # Optional. Example: \"http://example.com/snapshot.png\"\n\nmodels:\n- !include {model_card_uri} # Optional. Example: cat_classifier_model.yaml.\n\nassessments:\n- !include {assessment_card_uri} # Required. Example: iama.yaml.\n
"},{"location":"projects/tad/reporting-standard/0.1a5/#model-card","title":"Model Card","text":"provenance: # Optional.\n git_commit_hash: {git_commit_hash} # Optional. Example: 5503434ddd753f426f4b38109466949a1217c2bb\n timestamp: {modification_timestamp} # Optional. Example: 2024-04-16T16:48:14Z.\n uri: {modification_uri} # Optional. Example: https://github.com/MinBZK/tad-conversion-tool\n author: {modification_author} # Optional. Example: John Doe\nlanguage:\n - {lang_0} # Optional. Example nl.\nlicense: {license} # Required. Example: Apache-2.0 or any license SPDX ID from https://opensource.org/license or \"other\".\nlicense_name: {license_name} # Optional if license != other, Required otherwise. Example: 'my-license-1.0'\nlicense_link: {license_link} # Optional if license != other, Required otherwise. Specify \"LICENSE\" or \"LICENSE.md\" to link to a file of that name inside the repo, or a URL to a remote file.\ntags:\n- {tag_0} # Optional. Example: audio\n- {tag_1} # Optional. Example: automatic-speech-recognition\nowners:\n- organization: {organization_name} # Required. Example: BZK\n oin: {oin} # Optional. Example: 00000001003214345000\n name: {owner_name} # Optional. Example: John Doe\n email: {owner_email} # Optional. Example: johndoe@email.com\n role: {owner_role} # Optional. Example: Data Scientist.\n\nmodel-index:\n- name: {model_id} # Required. Example: CatClassifier.\n model: {model_uri} # Required. URI to a repository containing the model file.\n artifacts:\n - uri: {model_artifact_uri} # Optional. Example: \"https://github.com/MinBZK/poc-kijkdoos-wasm-models/raw/main/logres_iris/logreg_iris.onnx\"\n - content-type: {model_artifact_type} # Optional. Example: \"application/onnx\".\n - md5-checksum: {md5_checksum} # Optional. Example: \"120EA8A25E5D487BF68B5F7096440019\"\n parameters:\n - name: {parameter_name} # Optional. Example: \"epochs\".\n dtype: {parameter_dtype} # Optional. Example: \"int\".\n value: {parameter_value} # Optional. Example: 100.\n labels:\n - name: {label_name} # Optional. Example: \"gender\".\n dtype: {label_type} # Optional. Example: \"string\".\n value: {label_value} # Optional. Example: \"female\".\n results:\n - task:\n type: {task_type} # Required. Example: image-classification.\n name: {task_name} # Optional. Example: Image Classification.\n datasets:\n - type: {dataset_type} # Required. Example: common_voice. Link to a repository containing the dataset\n name: {dataset_name} # Required. Example: \"Common Voice (French)\". A pretty name for the dataset.\n split: {split} # Optional. Example: \"train\".\n features:\n - {feature_name} # Optional. Example: \"gender\".\n revision: {dataset_version} # Optional. Example: 5503434ddd753f426f4b38109466949a1217c2bb\n metrics:\n - type: {metric_type} # Required. Example: false-positive-rate. Use metric id from https://hf.co/metrics.\n name: {metric_name} # Required. Example: \"FPR wrt class 0 restricted to feature gender:0 and age:21\".\n dtype: {metric_dtype} # Required. Example: \"float\".\n value: {metric_value} # Required. Example: 0.75.\n labels:\n - name: {label_name} # Optional. Example: \"gender\".\n type: {label_type} # Optional. Example: \"feature\".\n dtype: {label_type} # Optional. Example: \"string\".\n value: {label_value} # Optional. Example: \"female\".\n measurements:\n # Bar plots should be able to capture SHAP and Robustness Toolbox from AI Verify.\n bar_plots:\n - type: {measurement_type} # Required. Example: \"SHAP\".\n name: {measurement_name} # Optional. Example: \"Mean Absolute Shap Values\".\n results:\n - name: {bar_name} # Required. The name of a bar.\n value: {bar_value} # Required. The corresponding value.\n # Graph plots should be able to capture graph based measurements such as partial dependence and accumulated local effect.\n graph_plots:\n - type: {measurement_type} # Required. Example: \"partial_dependence\".\n name: {measurement_name} # Optional. Example: \"Partial Dependence Plot\".\n # Results store the graph plot data. So far all plots are dependent on a combination of a specific class (sometimes) and feature (always).\n # For example partial dependence plots are made for each feature and class.\n results:\n - class: {class_name} # Optional. Name of the output class the graph depends on.\n feature: {feature_name} # Required. Name of the feature the graph depends on.\n data:\n - x_value: {x_value} # Required. The x value of the graph data.\n y_value: {y_value} # Required. The y value of the graph data.\n
"},{"location":"projects/tad/reporting-standard/0.1a5/#assessment-card","title":"Assessment Card","text":"provenance: # Optional.\n git_commit_hash: {git_commit_hash} # Optional. Example: 5503434ddd753f426f4b38109466949a1217c2bb\n timestamp: {modification_timestamp} # Optional. Example: 2024-04-16T16:48:14Z.\n uri: {modification_uri} # Optional. Example: https://github.com/MinBZK/tad-conversion-tool\n author: {modification_author} # Optional. Example: John Doe\nname: {assessment_name} # Required. Example: IAMA.\ndate: {assessment_date} # Required. Example: 25-03-2025.\ncontents:\n - question: {question_text} # Required. Example: \"Question 1: ...\".\n answer: {answer_text} # Required. Example: \"Answer: ...\".\n remarks: {remarks_text} # Optional. Example: \"Remarks: ...\".\n authors: # Optional. Example: \"['John', 'Peter']\".\n - name: {author_name}\n timestamp: {timestamp} # Optional. Example: 2024-04-16T16:48:14Z.\n
"},{"location":"projects/tad/reporting-standard/0.1a5/#schema","title":"Schema","text":"JSON schema will be added when we publish the first beta version.
"},{"location":"projects/tad/reporting-standard/0.1a5/#changelog","title":"Changelog","text":"Deviation from the Hugging Face specification is in the License field. Hugging Face only accepts dataset id's from Hugging Face license list while we accept any license from Open Source License List.\u00a0\u21a9\u21a9
Deviation from the Hugging Face specification is in the model_index:results:dataset
field. Hugging Face only accepts one dataset, while we accept a list of datasets.\u00a0\u21a9\u21a9
Deviation from the Hugging Face specification is in the Dataset Type field. Hugging Face only accepts dataset id's from Hugging Face datasets while we also allow for any url pointing to the dataset.\u00a0\u21a9\u21a9
For this extension to work relevant metrics (such as for example false positive rate) have to be added to the Hugging Face metrics, possibly this can be done in our organizational namespace.\u00a0\u21a9\u21a9
This document describes the Transparency of Algorithmic Decision making (TAD) Reporting Standard.
For reproducibility, governance, auditing and sharing of algorithmic systems it is essential to have a reporting standard so that information about an algorithmic system can be shared. This reporting standard describes how information about the different phases of an algorithm's life cycle can be reported. It contains, among other things, descriptive information combined with information about the technical tests and assessments applied.
Disclaimer
The TAD Reporting Standard is work in progress. This means that the current standard is probably suboptimal and will change significantly in future versions.
"},{"location":"projects/tad/reporting-standard/0.1a6/#introduction","title":"Introduction","text":"Inspired by Model Cards for Model Reporting and Papers with Code Model Index this standard almost1 2 3 4 extends the Hugging Face model card metadata specification to allow for:
metrics_field
from the Hugging Face metadata specification.measurements
.assessments
.Following Hugging Face, this proposed standard will be written in YAML.
This standard does not contain all fields present in the Hugging Face metadata specification. The fields that are optional in the Hugging Face specification and are specific to the Hugging Face interface are omitted.
Another difference is that we divide our implementation into three separate parts.
system_card
, containing information about a group of ML-models which accomplish a specific task.model_card
, containing information about a specific data science model.assessment_card
, containing information about a regulatory assessment.Include statements
These model_card
s and assessment_card
s can be included verbatim into a system_card
, or referenced with an !include
statement, allowing for minimal cards to be compact in a single file. Extensive cards can be split up for readability and maintainability. Our standard allows for the !include
to be used anywhere.
The standard will be written in YAML. Example YAML files are given in the next section. The standard defines three cards: a system_card
, a model_card
and an assessment_card
. A system_card
contains information about an algorithmic system. It can have multiple models and each of these models should have a model_card
. Regulatory assessments can be processed in an assessment_card
. Note that model_card
's and assessment_card
's can be included directly into the system_card
or can be included as separate YAML files with help of a YAML-include mechanism. For clarity the latter is preferred and is also used in the examples in the next section.
system_card
","text":"A system_card
contains the following information.
schema_version
(REQUIRED, string). Version of the schema used, for example \"0.1a2\".provenance
(OPTIONAL). In case this System Card is generated from another source file, this field can capture the historical context of the contents of this System Card.
git_commit_hash
(OPTIONAL, string). Git commit hash of the commit which contains the transformation file used to create this card.timestamp
(OPTIONAL, string). A timestamp of the date, time and timezone of generation of this System Card. Timestamp should be given, preferably in UTC (represented as Z
), in ISO 8601 format, i.e. 2024-04-16T16:48:14Z
.uri
(OPTIONAL, string). URI to the tool that was used to perform the transformations.author
(OPTIONAL, string). Name of person that initiated the transformations.name
(OPTIONAL, string). Name used to describe the system.
upl
(OPTIONAL, string). If this algorithm is part of a product offered by the Dutch Government, it should contain a URI from the Uniform Product List.owners
(OPTIONAL, list). There can be multiple owners. For each owner the following fields are present.
oin
(OPTIONAL, string). If applicable the Organisatie-identificatienummer (OIN).organization
(OPTIONAL, string). Name of the organization that owns the model. If ion
is NOT provided this field is REQUIRED.name
(OPTIONAL, string). Name of a contact person within the organization.email
(OPTIONAL, string). Email address of the contact person or organization.role
(OPTIONAL, string). Role of the contact person. This field should only be set when the name
field is set.description
(OPTIONAL, string). A short description of the system.
labels
(OPTIONAL, list). This fields allows to store meta information about a system. There can be multiple labels. For each label the following fields are present.
name
(OPTIONAL, string). Name of the label.value
(OPTIONAL, string). Value of the label.status
(OPTIONAL, string). The status of the system. For example the status can be \"production\".
publication_category
(OPTIONAL, enum[string]). The publication category of the algorithm should be chosen from [\"high_risk\", other\"]
.begin_date
(OPTIONAL, string). The first date the system was used. Date should be given in ISO 8601 format, i.e. YYYY-MM-DD
.end_date
(OPTIONAL, string). The last date the system was used. Date should be given in ISO 8601 format, i.e. YYYY-MM-DD
.goal_and_impact
(OPTIONAL, string). The purpose of the system and the impact it has on citizens and companies.considerations
(OPTIONAL, string). The pro's and con's of using the system.risk_management
(OPTIONAL, string). Description of the risks associated with the system.human_intervention
(OPTIONAL, string). A description to want extend there is human involvement in the system.legal_base
(OPTIONAL, list). If there exists a legal base for the process the system is embedded in, this field can be filled in with the relevant laws. There can be multiple legal bases. For each legal base the following fields are present.
name
(OPTIONAL, string). Name of the law.link
(OPTIONAL, string). URI pointing towards the contents of the law.used_data
(OPTIONAL, string). An overview of the data that is used in the system.
technical_design
(OPTIONAL, string). Description on how the system works.external_providers
(OPTIONAL, list). If relevant, these fields allow to store information on external providers. There can be multiple external providers.
name
(OPTIONAL, string). Name of the external provider.version
(OPTIONAL, string). Version of the external provider reflecting its relation to previous versions.references
(OPTIONAL, list[string]). Additional reference URI's that point information about the system and are relevant.
interaction_details
(OPTIONAL, list[string]). Explain how the AI system interacts with hardware or software, including other AI systems, or how the AI system can be used to interact with hardware or software.version_requirements
(OPTIONAL, list[string]). Describe the versions of the relevant software or firmware, and any requirements related to version updates.deployment_variants
(OPTIONAL, list[string]). Description of all the forms in which the AI system is placed on the market or put into service, such as software packages embedded into hardware, downloads, or APIs.hardware_requirements
(OPTIONAL, list[string]). Provide a description of the hardware on which the AI system must be run.product_markings
(OPTIONAL, list[string]). If the AI system is a component of products, photos, or illustrations, describe the external features, markings, and internal layout of those products.user_interface
(OPTIONAL, list). Provide information on the user interface provided to the user responsible for its operation.
description
(OPTIONAL, string). A description of the provided user interface.link
(OPTIONAL, string). A link to the user interface can be included.snapshot
(OPTIONAL, string). A snapshot/screenshot of the user interface can be included with the use of a hyperlink.models
(OPTIONAL, list[ModelCard]). A list of model cards (as defined below) or !include
s of a YAML file containing a model card. This model card can for example be a model card described in the next section or a model card from Hugging Face. There can be multiple model cards, meaning multiple models are used.
assessments
(OPTIONAL, list[AssessmentCard]). A list of assessment cards (as defined below) or !include
s of a YAML file containing a assessment card. This assessment card is an assessment card described in the next section. There can be multiple assessment cards, meaning multiple assessment were performed.
model_card
","text":"A model_card
contains the following information.
provenance
(OPTIONAL). In case this Model Card is generated from another source file, this field can capture the historical context of the contents of this Model Card.
git_commit_hash
(OPTIONAL, string). Git commit hash of the commit which contains the transformation file used to create this card.timestamp
(OPTIONAL, string). A timestamp of the date, time and timezone of generation of this System Card. Timestamp should be given, preferably in UTC (represented as Z
), in ISO 8601 format, i.e. 2024-04-16T16:48:14Z
.uri
(OPTIONAL, string). URI to the tool that was used to perform the transformations.author
(OPTIONAL, string). Name of person that initiated the transformations.language
(OPTIONAL, list[string]). If relevant, the natural languages the model supports in ISO 639. There can be multiple languages.
license
(REQUIRED).
license_name
(REQUIRED, string). Any license from the open source license list1. If the license is NOT present in the license list this field must be set to 'other' and the following two fields will be REQUIRED.license_link
(OPTIONAL, string). A link to a file of that name inside the repo, or a URL to a remote file containing the license contents.tags
(OPTIONAL, list[string]). Tags with keywords to describe the project. There can be multiple tags.
owners
(OPTIONAL, list). There can be multiple owners. For each owner the following fields are present.
oin
(OPTIONAL, string). If applicable the Organisatie-identificatienummer (OIN).organization
(OPTIONAL, string). Name of the organization that owns the model. If ion
is NOT provided this field is REQUIRED.name
(OPTIONAL, string). Name of a contact person within the organization.email
(OPTIONAL, string). Email address of the contact person or organization.role
(OPTIONAL, string). Role of the contact person. This field should only be set when the name
field is set.model_index
(REQUIRED, list). There can be multiple models. For each model the following fields are present.
name
(REQUIRED, string). The name of the model.model
(REQUIRED, string). A URI pointing to a repository containing the model file.artifacts
(OPTIONAL, list). A list of artifacts
uri
(OPTIONAL, string) URI refers to a relevant model artifactcontent-type
(OPTIONAL, string) Optional type, follow the Content-Type. Recognized values are \"application/onnx\"\", to refer to an ONNX representation of the model.md5-checksum
(OPTIONAL, string) Optional checksum for the content of the file.parameters
(OPTIONAL, list). There can be multiple parameters. For each parameter the following fields are present.
name
(REQUIRED, string). The name of the parameter, for example \"epochs\".dtype
(OPTIONAL, string). The datatype of the parameter, for example \"int\".value
(OPTIONAL, string). The value of the parameter, for example 100.labels
(OPTIONAL, list). This field allows to store meta information about a parameter. There can be multiple labels. For each label the following fields are present.
name
(OPTIONAL, string). The name of the label.dtype
(OPTIONAL, string). The datatype of the feature. If name
is set, this field is REQUIRED.value
(OPTIONAL, string). The value of the feature. If name
is set, this field is REQUIRED.results
(OPTIONAL, list). There can be multiple results. For each result the following fields are present.
task
(OPTIONAL, list).
task_type
(REQUIRED, string). The task of the model, for example \"object-classification\".task_name
(OPTIONAL, string). A pretty name for the model tasks, for example \"Object Classification\".datasets
(OPTIONAL, list). There can be multiple datasets 2. For each dataset the following fields are present.
type
(REQUIRED, string). The type of the dataset, can be a dataset id from Hugging Face datasets or any other link to a repository containing the dataset3, for example \"common_voice\".name
(REQUIRED, string). Name pretty name for the dataset, for example \"Common Voice (French)\".split
(OPTIONAL, string). The split of the dataset, for example \"train\".features
(OPTIONAL, list[string]). List of feature names.revision
(OPTIONAL, string). Version of the dataset, for example \"5503434ddd753f426f4b38109466949a1217c2bb\".metrics
(OPTIONAL, list). There can be multiple metrics. For each metric the following fields are present.
type
(REQUIRED, string). A metric-id from Hugging Face metrics4, for example accuracy.name
(REQUIRED, string). A descriptive name of the metric. For example \"false positive rate\" is not a descriptive name, but \"training false positive rate w.r.t class x\" is.dtype
(REQUIRED, string). The data type of the metric, for example float
.value
(REQUIRED, string). The value of the metric.labels
(OPTIONAL, list). This field allows to store meta information about a metric. For example, metrics can be computed for example on subgroups of specific features. For example, one can compute the accuracy for examples where the feature \"gender\" is set to \"male\". There can be multiple subgroups, which means that the metric is computed on the intersection of those subgroups. There can be multiple labels. For each label the following fields are present.
name
(OPTIONAL, string). The name of the feature. For example: \"gender\".type
(OPTIONAL, string). The type of the label. Can for example be set to \"feature\" or \"output_class\". If name
is set, this field is REQUIRED.dtype
(OPTIONAL, string). The datatype of the feature, for example float
. If name
is set, this field is REQUIRED.value
(OPTIONAL, string). The value of the feature. If name
is set, this field is REQUIRED. For example: \"male\".measurements
.
bar_plots
(OPTIONAL, list). The purpose of this field is to capture bar plot like measurements, for example SHAP values. There can be multiple bar plots. For each bar plot the following fields are present.
type
(REQUIRED, string). The type of bar plot, for example \"SHAP\".name
(OPTIONAL, string). A pretty name for the plot, for example \"Mean Absolute SHAP Values\".results
(REQUIRED, list). The contents of the bar plot. A result represents a bar. There can be multiple results. For each result the following fields are present.
name
(REQUIRED, string). The name of bar.value
(REQUIRED, float). The value of the corresponding bar.graph_plots
(OPTIONAL, list). The purpose of this field is to capture graph plot like measurements, such as partial dependence plots. There can be multiple graph plots. For each graph plot the following fields are present.
type
(REQUIRED, string). The type of the graph plot, for example \"partial_dependence\".name
(OPTIONAL, string). A pretty name of the graph, for example \"Partial Dependence Plot\".results
(REQUIRED, list). Results contains the graph plot data. Each graph can depend on a specific output class and feature. There can be multiple results. For each result the following fields are present.
class
(OPTIONAL, string/int/float/bool). The output class name that the graph corresponds to. This field is not always present.feature
(REQUIRED, string). The feature the graph corresponds to. This is required, since all relevant graphs are dependent on features.data
(REQUIRED, list)
x_value
(REQUIRED, float). The $x$-value of the graph.y_value
(REQUIRED, float). The $y$-value of the graph.assessment_card
","text":"An assessment_card
contains the following information.
provenance
(OPTIONAL). In case this Assessment Card is generated from another source file, this field can capture the historical context of the contents of this Assessment Card.
git_commit_hash
(OPTIONAL, string). Git commit hash of the commit which contains the transformation file used to create this card.timestamp
(OPTIONAL, string). A timestamp of the date, time and timezone of generation of this System Card. Timestamp should be given, preferably in UTC (represented as Z
), in ISO 8601 format, i.e. 2024-04-16T16:48:14Z
.uri
(OPTIONAL, string). URI to the tool that was used to perform the transformations.author
(OPTIONAL, string). Name of person that initiated the transformations.name
(REQUIRED, string). The name of the assessment.
date
(REQUIRED, string). The date at which the assessment is completed. Date should be given in ISO 8601 format, i.e. YYYY-MM-DD
.contents
(REQUIRED, list). There can be multiple items in contents. For each item the following fields are present:
question
(REQUIRED, string). A question.answer
(REQUIRED, string). An answer.remarks
(OPTIONAL, string). A field to put relevant discussion remarks in.authors
(OPTIONAL, list). There can be multiple names. For each name the following field is present.
name
(OPTIONAL, string). The name of the author of the question.timestamp
(OPTIONAL, string). A timestamp of the date, time and timezone of the answer. Timestamp should be given, preferably in UTC (represented as Z
), in ISO 8601 format, i.e. 2024-04-16T16:48:14Z
.
version: {system_card_version}\nprovenance:\n git_commit_hash: {git_commit_hash}\n timestamp: {modification_timestamp}\n uri: {modification_uri}\n author: {modification_author}\nname: {system_name}\nupl: {upl_uri}\nowners:\n - oin: {oin}\n organization: {organization_name}\n name: {owner_name}\n email: {owner_email}\n role: {owner_role}\ndescription: {system_description}\nlabels:\n - name: {label_name}\n value: {label_value}\nstatus: {system_status}\npublication_category: {system_publication_cat}\nbegin_date: {system_begin_date}\nend_date: {system_end_date}\ngoal_and_impact: {system_goal_and_impact}\nconsiderations: {system_considerations}\nrisk_management: {system_risk_management}\nhuman_intervention: {system_human_intervention}\nlegal_base:\n - name: {law_name}\n link: {law_uri}\nused_data: {system_used_data}\ntechnical_design: {technical_design}\nexternal_providers:\n - name: {name_external_provider}\n version: {version_external_provider}\nreferences:\n - {reference_uri}\ninteraction_details:\n - {system_interaction_details}\nversion_requirements:\n - {system_version_requirements}\ndeployment_variants:\n - {system_deployment_variants}\nhardware_requirements:\n - {system_hardware_requirements}\nproduct_markings:\n - {system_product_markings}\nuser_interface:\n - description: {system_user_interface}\n link: {system_user_interface_uri}\n snapshot: {system_user_interface_snapshot_uri}\n\nmodels:\n - !include {model_card_uri}\n\nassessments:\n - !include {assessment_card_uri}\n
"},{"location":"projects/tad/reporting-standard/0.1a6/#model-card","title":"Model Card","text":"provenance:\n git_commit_hash: {git_commit_hash}\n timestamp: {modification_timestamp}\n uri: {modification_uri}\n author: {modification_author}\nlanguage:\n - {lang_0}\nlicense:\n license_name: {license_name}\n license_link: {license_uri}\ntags:\n - {tag_0}\nowners:\n - oin: {oin}\n organization: {organization_name}\n name: {owner_name}\n email: {owner_email}\n role: {owner_role}\n\nmodel-index:\n - name: {model_id}\n model: {model_uri}\n artifacts:\n - uri: {model_artifact_uri}\n - content-type: {model_artifact_type}\n - md5-checksum: {md5_checksum}\n parameters:\n - name: {parameter_name}\n dtype: {parameter_dtype}\n value: {parameter_value}\n labels:\n - name: {label_name}\n dtype: {label_type}\n value: {label_value}\n results:\n - task:\n - type: {task_type}\n name: {task_name}\n datasets:\n - type: {dataset_type}\n name: {dataset_name}\n split: {split}\n features:\n - {feature_name}\n revision: {dataset_version}\n metrics:\n - type: {metric_type}\n name: {metric_name}\n dtype: {metric_dtype}\n value: {metric_value}\n labels:\n - name: {label_name}\n type: {label_type}\n dtype: {label_type}\n value: {label_value}\n measurements:\n bar_plots:\n - type: {measurement_type}\n name: {measurement_name}\n results:\n - name: {bar_name}\n value: {bar_value}\n graph_plots:\n - type: {measurement_type}\n name: {measurement_name}\n results:\n - class: {class_name}\n feature: {feature_name}\n data:\n - x_value: {x_value}\n y_value: {y_value}\n
"},{"location":"projects/tad/reporting-standard/0.1a6/#assessment-card","title":"Assessment Card","text":"provenance:\n git_commit_hash: {git_commit_hash}\n timestamp: {modification_timestamp}\n uri: {modification_uri}\n author: {modification_author}\nname: {assessment_name}\ndate: {assessment_date}\ncontents:\n - question: {question_text}\n answer: {answer_text}\n remarks: {remarks_text}\n authors:\n - name: {author_name}\n timestamp: {timestamp}\n
"},{"location":"projects/tad/reporting-standard/0.1a6/#schema","title":"Schema","text":"JSON schema will be added when we publish the first beta version.
"},{"location":"projects/tad/reporting-standard/0.1a6/#changelog","title":"Changelog","text":"Deviation from the Hugging Face specification is in the License field. Hugging Face only accepts dataset id's from Hugging Face license list while we accept any license from Open Source License List.\u00a0\u21a9\u21a9
Deviation from the Hugging Face specification is in the model_index:results:dataset
field. Hugging Face only accepts one dataset, while we accept a list of datasets.\u00a0\u21a9\u21a9
Deviation from the Hugging Face specification is in the Dataset Type field. Hugging Face only accepts dataset id's from Hugging Face datasets while we also allow for any url pointing to the dataset.\u00a0\u21a9\u21a9
For this extension to work relevant metrics (such as for example false positive rate) have to be added to the Hugging Face metrics, possibly this can be done in our organizational namespace.\u00a0\u21a9\u21a9
This document describes the Transparency of Algorithmic Decision making (TAD) Reporting Standard.
For reproducibility, governance, auditing and sharing of algorithmic systems it is essential to have a reporting standard so that information about an algorithmic system can be shared. This reporting standard describes how information about the different phases of an algorithm's life cycle can be reported. It contains, among other things, descriptive information combined with information about the technical tests and assessments applied.
Disclaimer
The TAD Reporting Standard is work in progress. This means that the current standard is probably suboptimal and will change significantly in future versions.
"},{"location":"projects/tad/reporting-standard/latest/#introduction","title":"Introduction","text":"Inspired by Model Cards for Model Reporting and Papers with Code Model Index this standard almost1 2 3 4 extends the Hugging Face model card metadata specification to allow for:
metrics_field
from the Hugging Face metadata specification.measurements
.assessments
.Following Hugging Face, this proposed standard will be written in YAML.
This standard does not contain all fields present in the Hugging Face metadata specification. The fields that are optional in the Hugging Face specification and are specific to the Hugging Face interface are omitted.
Another difference is that we divide our implementation into three separate parts.
system_card
, containing information about a group of ML-models which accomplish a specific task.model_card
, containing information about a specific data science model.assessment_card
, containing information about a regulatory assessment.Include statements
These model_card
s and assessment_card
s can be included verbatim into a system_card
, or referenced with an !include
statement, allowing for minimal cards to be compact in a single file. Extensive cards can be split up for readability and maintainability. Our standard allows for the !include
to be used anywhere.
The standard will be written in YAML. Example YAML files are given in the next section. The standard defines three cards: a system_card
, a model_card
and an assessment_card
. A system_card
contains information about an algorithmic system. It can have multiple models and each of these models should have a model_card
. Regulatory assessments can be processed in an assessment_card
. Note that model_card
's and assessment_card
's can be included directly into the system_card
or can be included as separate YAML files with help of a YAML-include mechanism. For clarity the latter is preferred and is also used in the examples in the next section.
system_card
","text":"A system_card
contains the following information.
schema_version
(REQUIRED, string). Version of the schema used, for example \"0.1a2\".provenance
(OPTIONAL). In case this System Card is generated from another source file, this field can capture the historical context of the contents of this System Card.
git_commit_hash
(OPTIONAL, string). Git commit hash of the commit which contains the transformation file used to create this card.timestamp
(OPTIONAL, string). A timestamp of the date, time and timezone of generation of this System Card. Timestamp should be given, preferably in UTC (represented as Z
), in ISO 8601 format, i.e. 2024-04-16T16:48:14Z
.uri
(OPTIONAL, string). URI to the tool that was used to perform the transformations.author
(OPTIONAL, string). Name of person that initiated the transformations.name
(OPTIONAL, string). Name used to describe the system.
upl
(OPTIONAL, string). If this algorithm is part of a product offered by the Dutch Government, it should contain a URI from the Uniform Product List.owners
(OPTIONAL, list). There can be multiple owners. For each owner the following fields are present.
oin
(OPTIONAL, string). If applicable the Organisatie-identificatienummer (OIN).organization
(OPTIONAL, string). Name of the organization that owns the model. If ion
is NOT provided this field is REQUIRED.name
(OPTIONAL, string). Name of a contact person within the organization.email
(OPTIONAL, string). Email address of the contact person or organization.role
(OPTIONAL, string). Role of the contact person. This field should only be set when the name
field is set.description
(OPTIONAL, string). A short description of the system.
labels
(OPTIONAL, list). This fields allows to store meta information about a system. There can be multiple labels. For each label the following fields are present.
name
(OPTIONAL, string). Name of the label.value
(OPTIONAL, string). Value of the label.status
(OPTIONAL, string). The status of the system. For example the status can be \"production\".
publication_category
(OPTIONAL, enum[string]). The publication category of the algorithm should be chosen from [\"high_risk\", other\"]
.begin_date
(OPTIONAL, string). The first date the system was used. Date should be given in ISO 8601 format, i.e. YYYY-MM-DD
.end_date
(OPTIONAL, string). The last date the system was used. Date should be given in ISO 8601 format, i.e. YYYY-MM-DD
.goal_and_impact
(OPTIONAL, string). The purpose of the system and the impact it has on citizens and companies.considerations
(OPTIONAL, string). The pro's and con's of using the system.risk_management
(OPTIONAL, string). Description of the risks associated with the system.human_intervention
(OPTIONAL, string). A description to want extend there is human involvement in the system.legal_base
(OPTIONAL, list). If there exists a legal base for the process the system is embedded in, this field can be filled in with the relevant laws. There can be multiple legal bases. For each legal base the following fields are present.
name
(OPTIONAL, string). Name of the law.link
(OPTIONAL, string). URI pointing towards the contents of the law.used_data
(OPTIONAL, string). An overview of the data that is used in the system.
technical_design
(OPTIONAL, string). Description on how the system works.external_providers
(OPTIONAL, list). If relevant, these fields allow to store information on external providers. There can be multiple external providers.
name
(OPTIONAL, string). Name of the external provider.version
(OPTIONAL, string). Version of the external provider reflecting its relation to previous versions.references
(OPTIONAL, list[string]). Additional reference URI's that point information about the system and are relevant.
interaction_details
(OPTIONAL, list[string]). Explain how the AI system interacts with hardware or software, including other AI systems, or how the AI system can be used to interact with hardware or software.version_requirements
(OPTIONAL, list[string]). Describe the versions of the relevant software or firmware, and any requirements related to version updates.deployment_variants
(OPTIONAL, list[string]). Description of all the forms in which the AI system is placed on the market or put into service, such as software packages embedded into hardware, downloads, or APIs.hardware_requirements
(OPTIONAL, list[string]). Provide a description of the hardware on which the AI system must be run.product_markings
(OPTIONAL, list[string]). If the AI system is a component of products, photos, or illustrations, describe the external features, markings, and internal layout of those products.user_interface
(OPTIONAL, list). Provide information on the user interface provided to the user responsible for its operation.
description
(OPTIONAL, string). A description of the provided user interface.link
(OPTIONAL, string). A link to the user interface can be included.snapshot
(OPTIONAL, string). A snapshot/screenshot of the user interface can be included with the use of a hyperlink.models
(OPTIONAL, list[ModelCard]). A list of model cards (as defined below) or !include
s of a YAML file containing a model card. This model card can for example be a model card described in the next section or a model card from Hugging Face. There can be multiple model cards, meaning multiple models are used.
assessments
(OPTIONAL, list[AssessmentCard]). A list of assessment cards (as defined below) or !include
s of a YAML file containing a assessment card. This assessment card is an assessment card described in the next section. There can be multiple assessment cards, meaning multiple assessment were performed.
model_card
","text":"A model_card
contains the following information.
provenance
(OPTIONAL). In case this Model Card is generated from another source file, this field can capture the historical context of the contents of this Model Card.
git_commit_hash
(OPTIONAL, string). Git commit hash of the commit which contains the transformation file used to create this card.timestamp
(OPTIONAL, string). A timestamp of the date, time and timezone of generation of this System Card. Timestamp should be given, preferably in UTC (represented as Z
), in ISO 8601 format, i.e. 2024-04-16T16:48:14Z
.uri
(OPTIONAL, string). URI to the tool that was used to perform the transformations.author
(OPTIONAL, string). Name of person that initiated the transformations.language
(OPTIONAL, list[string]). If relevant, the natural languages the model supports in ISO 639. There can be multiple languages.
license
(REQUIRED).
license_name
(REQUIRED, string). Any license from the open source license list1. If the license is NOT present in the license list this field must be set to 'other' and the following two fields will be REQUIRED.license_link
(OPTIONAL, string). A link to a file of that name inside the repo, or a URL to a remote file containing the license contents.tags
(OPTIONAL, list[string]). Tags with keywords to describe the project. There can be multiple tags.
owners
(OPTIONAL, list). There can be multiple owners. For each owner the following fields are present.
oin
(OPTIONAL, string). If applicable the Organisatie-identificatienummer (OIN).organization
(OPTIONAL, string). Name of the organization that owns the model. If ion
is NOT provided this field is REQUIRED.name
(OPTIONAL, string). Name of a contact person within the organization.email
(OPTIONAL, string). Email address of the contact person or organization.role
(OPTIONAL, string). Role of the contact person. This field should only be set when the name
field is set.model_index
(REQUIRED, list). There can be multiple models. For each model the following fields are present.
name
(REQUIRED, string). The name of the model.model
(REQUIRED, string). A URI pointing to a repository containing the model file.artifacts
(OPTIONAL, list). A list of artifacts
uri
(OPTIONAL, string) URI refers to a relevant model artifactcontent-type
(OPTIONAL, string) Optional type, follow the Content-Type. Recognized values are \"application/onnx\"\", to refer to an ONNX representation of the model.md5-checksum
(OPTIONAL, string) Optional checksum for the content of the file.parameters
(OPTIONAL, list). There can be multiple parameters. For each parameter the following fields are present.
name
(REQUIRED, string). The name of the parameter, for example \"epochs\".dtype
(OPTIONAL, string). The datatype of the parameter, for example \"int\".value
(OPTIONAL, string). The value of the parameter, for example 100.labels
(OPTIONAL, list). This field allows to store meta information about a parameter. There can be multiple labels. For each label the following fields are present.
name
(OPTIONAL, string). The name of the label.dtype
(OPTIONAL, string). The datatype of the feature. If name
is set, this field is REQUIRED.value
(OPTIONAL, string). The value of the feature. If name
is set, this field is REQUIRED.results
(OPTIONAL, list). There can be multiple results. For each result the following fields are present.
task
(OPTIONAL, list).
task_type
(REQUIRED, string). The task of the model, for example \"object-classification\".task_name
(OPTIONAL, string). A pretty name for the model tasks, for example \"Object Classification\".datasets
(OPTIONAL, list). There can be multiple datasets 2. For each dataset the following fields are present.
type
(REQUIRED, string). The type of the dataset, can be a dataset id from Hugging Face datasets or any other link to a repository containing the dataset3, for example \"common_voice\".name
(REQUIRED, string). Name pretty name for the dataset, for example \"Common Voice (French)\".split
(OPTIONAL, string). The split of the dataset, for example \"train\".features
(OPTIONAL, list[string]). List of feature names.revision
(OPTIONAL, string). Version of the dataset, for example \"5503434ddd753f426f4b38109466949a1217c2bb\".metrics
(OPTIONAL, list). There can be multiple metrics. For each metric the following fields are present.
type
(REQUIRED, string). A metric-id from Hugging Face metrics4, for example accuracy.name
(REQUIRED, string). A descriptive name of the metric. For example \"false positive rate\" is not a descriptive name, but \"training false positive rate w.r.t class x\" is.dtype
(REQUIRED, string). The data type of the metric, for example float
.value
(REQUIRED, string). The value of the metric.labels
(OPTIONAL, list). This field allows to store meta information about a metric. For example, metrics can be computed for example on subgroups of specific features. For example, one can compute the accuracy for examples where the feature \"gender\" is set to \"male\". There can be multiple subgroups, which means that the metric is computed on the intersection of those subgroups. There can be multiple labels. For each label the following fields are present.
name
(OPTIONAL, string). The name of the feature. For example: \"gender\".type
(OPTIONAL, string). The type of the label. Can for example be set to \"feature\" or \"output_class\". If name
is set, this field is REQUIRED.dtype
(OPTIONAL, string). The datatype of the feature, for example float
. If name
is set, this field is REQUIRED.value
(OPTIONAL, string). The value of the feature. If name
is set, this field is REQUIRED. For example: \"male\".measurements
.
bar_plots
(OPTIONAL, list). The purpose of this field is to capture bar plot like measurements, for example SHAP values. There can be multiple bar plots. For each bar plot the following fields are present.
type
(REQUIRED, string). The type of bar plot, for example \"SHAP\".name
(OPTIONAL, string). A pretty name for the plot, for example \"Mean Absolute SHAP Values\".results
(REQUIRED, list). The contents of the bar plot. A result represents a bar. There can be multiple results. For each result the following fields are present.
name
(REQUIRED, string). The name of bar.value
(REQUIRED, float). The value of the corresponding bar.graph_plots
(OPTIONAL, list). The purpose of this field is to capture graph plot like measurements, such as partial dependence plots. There can be multiple graph plots. For each graph plot the following fields are present.
type
(REQUIRED, string). The type of the graph plot, for example \"partial_dependence\".name
(OPTIONAL, string). A pretty name of the graph, for example \"Partial Dependence Plot\".results
(REQUIRED, list). Results contains the graph plot data. Each graph can depend on a specific output class and feature. There can be multiple results. For each result the following fields are present.
class
(OPTIONAL, string/int/float/bool). The output class name that the graph corresponds to. This field is not always present.feature
(REQUIRED, string). The feature the graph corresponds to. This is required, since all relevant graphs are dependent on features.data
(REQUIRED, list)
x_value
(REQUIRED, float). The $x$-value of the graph.y_value
(REQUIRED, float). The $y$-value of the graph.assessment_card
","text":"An assessment_card
contains the following information.
provenance
(OPTIONAL). In case this Assessment Card is generated from another source file, this field can capture the historical context of the contents of this Assessment Card.
git_commit_hash
(OPTIONAL, string). Git commit hash of the commit which contains the transformation file used to create this card.timestamp
(OPTIONAL, string). A timestamp of the date, time and timezone of generation of this System Card. Timestamp should be given, preferably in UTC (represented as Z
), in ISO 8601 format, i.e. 2024-04-16T16:48:14Z
.uri
(OPTIONAL, string). URI to the tool that was used to perform the transformations.author
(OPTIONAL, string). Name of person that initiated the transformations.name
(REQUIRED, string). The name of the assessment.
urn
(OPTIONAL, string). A Uniform Resource Name (URN) of the instrument in the instrument register.date
(REQUIRED, string). The date at which the assessment is completed. Date should be given in ISO 8601 format, i.e. YYYY-MM-DD
.contents
(REQUIRED, list). There can be multiple items in contents. For each item the following fields are present:
question
(REQUIRED, string). A question.urn
(OPTIONAL, string). A Uniform Resource Name (URN) of the corresponding task in the instrument register.answer
(REQUIRED, string). An answer.remarks
(OPTIONAL, string). A field to put relevant discussion remarks in.authors
(OPTIONAL, list). There can be multiple names. For each name the following field is present.
name
(OPTIONAL, string). The name of the author of the question.timestamp
(OPTIONAL, string). A timestamp of the date, time and timezone of the answer. Timestamp should be given, preferably in UTC (represented as Z
), in ISO 8601 format, i.e. 2024-04-16T16:48:14Z
.
version: {system_card_version}\nprovenance:\n git_commit_hash: {git_commit_hash}\n timestamp: {modification_timestamp}\n uri: {modification_uri}\n author: {modification_author}\nname: {system_name}\nupl: {upl_uri}\nowners:\n - oin: {oin}\n organization: {organization_name}\n name: {owner_name}\n email: {owner_email}\n role: {owner_role}\ndescription: {system_description}\nlabels:\n - name: {label_name}\n value: {label_value}\nstatus: {system_status}\npublication_category: {system_publication_cat}\nbegin_date: {system_begin_date}\nend_date: {system_end_date}\ngoal_and_impact: {system_goal_and_impact}\nconsiderations: {system_considerations}\nrisk_management: {system_risk_management}\nhuman_intervention: {system_human_intervention}\nlegal_base:\n - name: {law_name}\n link: {law_uri}\nused_data: {system_used_data}\ntechnical_design: {technical_design}\nexternal_providers:\n - name: {name_external_provider}\n version: {version_external_provider}\nreferences:\n - {reference_uri}\ninteraction_details:\n - {system_interaction_details}\nversion_requirements:\n - {system_version_requirements}\ndeployment_variants:\n - {system_deployment_variants}\nhardware_requirements:\n - {system_hardware_requirements}\nproduct_markings:\n - {system_product_markings}\nuser_interface:\n - description: {system_user_interface}\n link: {system_user_interface_uri}\n snapshot: {system_user_interface_snapshot_uri}\n\nmodels:\n - !include {model_card_uri}\n\nassessments:\n - !include {assessment_card_uri}\n
"},{"location":"projects/tad/reporting-standard/latest/#model-card","title":"Model Card","text":"provenance:\n git_commit_hash: {git_commit_hash}\n timestamp: {modification_timestamp}\n uri: {modification_uri}\n author: {modification_author}\nlanguage:\n - {lang_0}\nlicense:\n license_name: {license_name}\n license_link: {license_uri}\ntags:\n - {tag_0}\nowners:\n - oin: {oin}\n organization: {organization_name}\n name: {owner_name}\n email: {owner_email}\n role: {owner_role}\n\nmodel-index:\n - name: {model_id}\n model: {model_uri}\n artifacts:\n - uri: {model_artifact_uri}\n - content-type: {model_artifact_type}\n - md5-checksum: {md5_checksum}\n parameters:\n - name: {parameter_name}\n dtype: {parameter_dtype}\n value: {parameter_value}\n labels:\n - name: {label_name}\n dtype: {label_type}\n value: {label_value}\n results:\n - task:\n - type: {task_type}\n name: {task_name}\n datasets:\n - type: {dataset_type}\n name: {dataset_name}\n split: {split}\n features:\n - {feature_name}\n revision: {dataset_version}\n metrics:\n - type: {metric_type}\n name: {metric_name}\n dtype: {metric_dtype}\n value: {metric_value}\n labels:\n - name: {label_name}\n type: {label_type}\n dtype: {label_type}\n value: {label_value}\n measurements:\n bar_plots:\n - type: {measurement_type}\n name: {measurement_name}\n results:\n - name: {bar_name}\n value: {bar_value}\n graph_plots:\n - type: {measurement_type}\n name: {measurement_name}\n results:\n - class: {class_name}\n feature: {feature_name}\n data:\n - x_value: {x_value}\n y_value: {y_value}\n
"},{"location":"projects/tad/reporting-standard/latest/#assessment-card","title":"Assessment Card","text":"provenance:\n git_commit_hash: {git_commit_hash}\n timestamp: {modification_timestamp}\n uri: {modification_uri}\n author: {modification_author}\nname: {assessment_name}\nurn: {urn}\ndate: {assessment_date}\ncontents:\n - question: {question_text}\n urn: {urn}\n answer: {answer_text}\n remarks: {remarks_text}\n authors:\n - name: {author_name}\n timestamp: {timestamp}\n
"},{"location":"projects/tad/reporting-standard/latest/#schema","title":"Schema","text":"JSON schema will be added when we publish the first beta version.
"},{"location":"projects/tad/reporting-standard/latest/#changelog","title":"Changelog","text":"Deviation from the Hugging Face specification is in the License field. Hugging Face only accepts dataset id's from Hugging Face license list while we accept any license from Open Source License List.\u00a0\u21a9\u21a9
Deviation from the Hugging Face specification is in the model_index:results:dataset
field. Hugging Face only accepts one dataset, while we accept a list of datasets.\u00a0\u21a9\u21a9
Deviation from the Hugging Face specification is in the Dataset Type field. Hugging Face only accepts dataset id's from Hugging Face datasets while we also allow for any url pointing to the dataset.\u00a0\u21a9\u21a9
For this extension to work relevant metrics (such as for example false positive rate) have to be added to the Hugging Face metrics, possibly this can be done in our organizational namespace.\u00a0\u21a9\u21a9
The purpose of a code review is to ensure the quality, readability, and that all requirements from the ticket have been met for a change before it gets merged into the main codebase. Additionally, code reviews are a communication tool, they allow team members to stay aware of changes being made.
Code reviews involve having a team member examine the changes made by another team member and give feedback or ask questions if needed.
"},{"location":"way-of-working/code-reviews/#creating-a-pull-request","title":"Creating a Pull Request","text":"We use GitHub pull requests (PR) for code reviews. You can make a draft PR if your work is still in progress. When you are done you can remove the draft status. A team member may start reviewing when the PR does not have a draft status.
For team ADRs at least 3 accepting reviews are required, or all team members should accept if it can be expected that the ADR is controversial.
A team ADR is an ADR made in the ai-validation repository.
All other PRs only need at least 1 reviewer to get accepted, but can have more reviewers if desired (by either reviewer or author).
"},{"location":"way-of-working/code-reviews/#review-process","title":"Review process","text":"By default the codeowner, indicated in the CODEOWNER file, will be requested to review. For us this is the GitHub team AI-validation. If the PR creator wants a specific team member to review, the PR creator should add the team member specifically in the reviewers section of the PR. A message in Mattermost will be posted for PRs. Then with the reaction of an emoji a reviewer will indicate they are looking at the PR.
If the reviewer has suggestions or comments the PR creator can fix those or add comments to the suggestions. When the creator of the PR thinks he is done with the feedback he must re-request a review from the person that did the review. The reviewer must then look at the changes and approve or add more comments. This process continues until the reviewer agrees that all is correct and approves the PR.
Once the review is approved the reviewer checks if the branch is in sync with the main branch before merging. If not, the reviewer rebases the branch. Once the branch is in sync with main the reviewer merges the PR and checks if the deployment is successful. If the deployment is not successful the reviewer fixes it. If the PR needs more than one review, the last accepting reviewer merges the PR.
"},{"location":"way-of-working/contributing/","title":"Contributing to AI Validation","text":"First off, thanks for taking the time to contribute! \u2764\ufe0f
All types of contributions are encouraged and valued. See the Table of Contents for different ways to help and details about how this project handles them. Please make sure to read the relevant section before making your contribution. It will make it a lot easier for us maintainers and smooth out the experience for all involved. The community looks forward to your contributions. \ud83c\udf89
"},{"location":"way-of-working/contributing/#table-of-contents","title":"Table of Contents","text":"This project and everyone participating in it is governed by the Code of Conduct. By participating, you are expected to uphold this code. Please report unacceptable behavior to ai-validatie@minbzk.nl.
"},{"location":"way-of-working/contributing/#i-have-a-question","title":"I Have a Question","text":"Before you ask a question, it is best to search for existing Issues that might help you. In case you have found a suitable issue and still need clarification, you can write your question in this issue.
If you then still feel the need to ask a question and need clarification, we recommend the following:
We will then take care of the issue as soon as possible.
"},{"location":"way-of-working/contributing/#i-want-to-contribute","title":"I Want To Contribute","text":""},{"location":"way-of-working/contributing/#legal-notice","title":"Legal Notice","text":"When contributing to this project, you must agree that you have authored 100% of the content, that you have the necessary rights to the content and that the content you contribute may be provided under the project license.
"},{"location":"way-of-working/contributing/#reporting-bugs","title":"Reporting Bugs","text":""},{"location":"way-of-working/contributing/#before-submitting-a-bug-report","title":"Before Submitting a Bug Report","text":"A good bug report shouldn't leave others needing to chase you up for more information. Therefore, we ask you to investigate carefully, collect information and describe the issue in detail in your report. Please complete the following steps in advance to help us fix any potential bug as fast as possible.
You must never report security related issues, vulnerabilities or bugs including sensitive information to the issue tracker, or elsewhere in public. Instead sensitive bugs must be sent by email to ai-validatie@minbzk.nl.
We use GitHub issues to track bugs and errors. If you run into an issue with the project:
Once it's filed:
needs-repro
. Bugs with the needs-repro
tag will not be addressed until they are reproduced.needs-fix
, as well as possibly other tags (such as critical
), and the issue will be left to be implemented by someone.This section guides you through submitting an enhancement suggestion for this project, including completely new features and minor improvements. Following these guidelines will help maintainers and the community to understand your suggestion and find related suggestions.
"},{"location":"way-of-working/contributing/#before-submitting-an-enhancement","title":"Before Submitting an Enhancement","text":"Enhancement suggestions are tracked as GitHub issues.
We have commit message conventions: Commit convention
"},{"location":"way-of-working/contributing/#markdown-lint","title":"Markdown Lint","text":"We use Markdown lint to standardize Markdown: Markdown lint config.
"},{"location":"way-of-working/contributing/#pre-commit","title":"Pre-commit","text":"We use pre-commit to enabled standardization: pre-commit config.
"},{"location":"way-of-working/decision-log/","title":"Decision Log","text":"Throughout our work, small decisions about processes and approaches are often made in meetings and chats. While these aren't big enough for formal documentation like ADRs, capturing them is valuable for both current and future team members.
This log provides a reference point for those decisions.
"},{"location":"way-of-working/decision-log/#overview-of-decisions","title":"Overview of decisions","text":"We're sad to see you go! But if you do, here's what not to forget.
"},{"location":"way-of-working/off-boarding/#github","title":"GitHub","text":"For clarity and consistency, this document defines some terms used within our team where the meaning in Data Science or Computer Science differs, and terms that are for any reason good to mention.
For a full reference for Machine Learning, we recommend ML Fundamentals from Google.
"},{"location":"way-of-working/onboarding/","title":"Onboarding","text":"Make sure you have installed Mattermost, then follow these steps.
Make sure you have installed Webex, then follow these steps.
Make sure you have installed Tuple, then follow these steps.
Create or use your existing GitHub account.
Bookmark these links in your browser:
We use HashiCorp Vault secrets manager for team secrets. You can login with a GitHub Personal access token. The token needs organization read permissions (read:org
), and you should be part of our GitHub team to access the vault.
We are assuming your dev machine is a Mac. This guide is rather opinionated, feel free to have your own opinion, and feel free to contribute! Contributing can be done by clicking \"edit\" top right and by making a pull request on this repository.
"},{"location":"way-of-working/onboarding/dev-machine/#things-that-should-have-been-default-on-mac","title":"Things that should have been default on Mac","text":"Homebrew as the missing Package Manager
/bin/bash -c \"$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)\"\n
Rectangle
brew install --cask rectangle\n
WebEx for video conferencing
brew install --cask webex\n
Mattermost for team communication
brew install --cask mattermost\n
Iterm2
brew install --cask iterm2\n
Oh My Zsh
/bin/bash -c \"$(curl -fsSL https://raw.githubusercontent.com/ohmyzsh/ohmyzsh/master/tools/install.sh)\"\n
Autosuggestions for zsh
git clone https://github.com/zsh-users/zsh-autosuggestions ~/.oh-my-zsh/custom/plugins/zsh-autosuggestions\n
Fish shell like syntax highlighting for Zsh
brew install zsh-syntax-highlighting\n
Add plugins to your shell in ~/.zshrc
plugins = (\n # other plugins...\n zsh-autosuggestions\n kubectl\n docker\n docker-compose\n pyenv\n z\n)\n
Touch ID in Terminal
Sourcetree
brew install --cask sourcetree\n
Pyenv
brew install pyenv\n
pyenv virtualenv
brew install pyenv-virtualenv\n
pre-commit
brew install pre-commit\n
Xcode Command Line Tools
xcode-select --install\n
TabbyML Opensource, self-hosted AI coding assistant
We can not just use hosted versions of coding assistants because of privacy and copyright issues. We can however use self-hosted coding assistants provided they are trained on data with permissive licenses.
StarCoder (1-7B) models are all trained on version 1.2 of The Stack dataset. It boils down to all open GitHub code with permissive licenses (193 licenses in total). Minus opt-out requests.
Code Lama and Deepseek models are not clear enough about their data licenses.
brew install tabbyml/tabby/tabby\ntabby serve --device metal --model TabbyML/StarCoder-3B\n
Then configure your IDE by installing a plugin.
Sign commits using SSH