diff --git a/Projects/TAD/index.html b/Projects/TAD/index.html index 84b93a2a..79baf392 100644 --- a/Projects/TAD/index.html +++ b/Projects/TAD/index.html @@ -1432,10 +1432,6 @@
Here we are documenting the processes and work of the AI Validation Team at the Ministry of the Interior and Kingdom Relations in The Netherlands.
We are a team of mostly engineers at a policy department.
Read our guide on how to contribute.
Our contact details are here.
In modern software development practices, the use of Architecture Decision Records (ADRs) has become increasingly common. ADRs are documents that capture important architectural decisions made during the development process. These decisions play a crucial role in guiding the development team and ensuring consistency and coherence in the architecture of the software system.
We will utilize ADRs in our team to document and communicate architectural decisions effectively. Furthermore, we will publish these ADRs publicly to promote transparency and facilitate collaboration.
Use the template below to add an ADR:
# ADR-XXXX Title\n\n## Context\n\nWhat is the issue that we're seeing that is motivating this decision or change?\n\n## Assumptions\n\nAnything that could cause problems if untrue now or later. (optional)\n\n## Decision\n\nWhat is the change that we're proposing and/or doing?\n\n## Risks\n\nAnything that could cause malfunction, delay, or other negative impacts. (optional)\n\n## Consequences\n\nWhat becomes easier or more difficult to do because of this change?\n\n## More Information\n\nProvide additional evidence/confidence for the decision outcome\nLinks to other decisions and resources might here appear as well. (optional)\n
In the landscape of software development, the choice of coding platform significantly impacts developer productivity, collaboration, and code quality. it's crucial to evaluate and select a coding platform that aligns with our development needs and fosters efficient workflows.
The following assumptions are made:
After careful consideration and evaluation of various options like GitHub, GitLab and BitBucket, we propose adopting GitHub as our primary coding platform. The decision is based on the following factors:
Costs: There are currently no costs associate in using GitHub for our usecases.
Features and Functionality: GitHub offers a comprehensive set of features essential for modern software development and collaboration with external teams, including version control, code review, issue tracking, continuous integration, and deployment automation.
Security: GitHub offers a complete set of security features essential to secure development like dependency management and security scanning.
Community and Ecosystem: GitHub boasts a vibrant community and ecosystem, facilitating knowledge sharing, collaboration, and access to third-party tools, and services that can enhance our development workflows. Within our organization we have easy access to the team managing the GitHub organization.
Usability and User Experience: A user-friendly interface and intuitive workflows are essential for maximizing developer productivity and minimizing onboarding time. GitHub offers a streamlined user experience and customizable workflows that align with our team's preferences and practices.
Currently the organization of MinBZK on GitHub does not have a lot of people indicating that our team is an early adapter of the platform within the organization. This might impact our features due to cost constrains.
people
If we choose another tool in the future we need to migrate our codebase, and potentially need to rewrite some specific GitHub features that cannot be used in another tool.
Alternatives considered:
Our development team wants to implement a CI/CD solution to streamline the build, testing, and deployment workflows of our software products. Currently, our codebase resides on GitHub, and we leverage Kubernetes as our chosen orchestration platform, managed by the DigiLab platform team.
We will use the following tools for CI/CD pipeline:
GitHub Actions aligns with our existing infrastructure, ensuring seamless integration with our codebase and minimizing operational overhead. GitHub Actions' specific syntax for CI results in vendor lock-in, necessitating significant effort to migrate to an alternative CI system in the future.
Flux, being a GitOps operator for Kubernetes, offers a declarative approach to managing deployments, enhancing reliability and repeatabilty within our Kubernetes ecosystem.
Our team recognizes the necessity of a platform to run our software, as our local machines lack the capacity to handle certain workloads effectively. We have evaluated several options available to us:
We operate under the following assumptions:
We will use Digilab Kubernetes for our workloads.
By choosing Digilab Kubernetes, we gain access to a namespace within their managed Kubernetes cluster. However, it's important to note that Digilab does not provide any guarantees regarding the availability of the cluster. Should our software require higher availability assurances, we may need to explore alternative solutions.
In modern software development, maintaining code quality is crucial for readability, maintainability, and collaboration. Python, being a dynamically typed language, requires robust tooling to ensure code consistency and type safety. Manual enforcement of coding standards is time-consuming and error-prone. Hence, adopting automated tooling to streamline this process is imperative.
We will use these standards and tools for our own projects:
Working with external projects these coding standards will not always be possible. but we will try to integrate them as much as possible.
Improved Code Quality: Adoption of these tools will lead to improved code quality, consistency, and maintainability across the project.
Enhanced Developer Productivity: Automated code formatting and static type checking will reduce manual effort and free developers to focus more on coding logic rather than formatting and type-related issues.
Reduced Bug Incidence: Static typing and linting will catch potential bugs and issues early in the development process, reducing the likelihood of runtime errors and debugging efforts.
Standardized Development Workflow: By integrating pre-commit hooks, the development workflow will be standardized, ensuring that all developers follow the same code quality standards.
Our development team wants to enhance transparency and productivity in our software development processes. We are using GitHub for version control and collaboration. However, to further streamline our process, there is a need to incorporate tooling for managing the effort of our team.
We will use GitHub Projects as our agile process tool
GitHub Projects seamlessly integrates with our existing GitHub repositories, allowing us to manage our Agile processes. within the same ecosystem where our code resides. This integration eliminates the need for additional third-party tools, simplifying our workflow.
In software development, maintaining clear and consistent commit message conventions is crucial for effective collaboration, code review, and project management. Commit messages serve as a form of documentation, helping developers understand the changes introduced by each commit without having to analyze the code diff extensively.
A commit message must follow the following rules:
\\<ref>-\\<ticketnumber>: subject line
An example of a commit message:
Fix foo to enable bar
or
AB-1234: Fix foo to enable bar
This fixes the broken behavior of component abc caused by problem xyz.
If we contribute to projects not started by us we try to follow the above standard unless a specific convention is obvious or required by the project.
In some repositories Conventional Commits are used. This ADR does not follow conventional commits.
To communicate our designs in a graphical manner, it is of importance to draw architectural diagrams. For this we use tooling, that supports us in our work. We need to have something that is written so that it can be processed by both people and machine, and we want to have version control on our diagrams.
We will write our architectural diagrams in Markdown-like (.mmmd) in the Mermaid Syntax to edit these diagrams one can use the various plugins. For each project where it is needed, we will add the diagrams in the repository of the subject. The level of detail we will provide in the diagrams is according to the C4-model metamodel on architecture diagramming.
Standardized Workflow: By maintaining architecture as code, it will be standardized in our workflow.
Version control on diagrams: By using version control, we will be able to collaborate easier on the diagrams, and we will be able to see the history of them.
Diagrams are in .md format: By storing our diagrams next to our code, it will be where you need it the most.
Containers allow us to package and run applications in a standardized and portable way. To be able to (re)use and share images, they need to be stored in a registry that is accessible by others.
There are many container registries. During research the following registries have been noted:
Docker Hub, GitHub Container Registry, Amazon Elastic Container Registry (ECR), Azure Container Registry (ACR), Google Artifact Registry (GAR), Red Hat Quay, GitLab Container Registry, Harbor, Sonatype Nexus Repository Manager, JFrog Artifactory.
We will use GitHub Container Registry.
This aligns best with the previously made choices for GitHub as a code repository and CI/CD workflow.
Traditionally, Docker Hub has been the place to publish images. Therefore, our images may be more difficult to discover.
The following assumptions are not (directly) covered by the chosen registry:
By using GitHub Container Registry we have a container registry we can use both internally as well as share with others. This has low impact, we can always move to another registry since the Open Container Initiative is standardized.
The following sites have been consulted:
The AI validation team works transparently. Working with public funds warrants transparency toward the public. Additionally, being transparent aligns with the team's mission of increasing the transparency of public organizations. In line with this reasoning, it is important to be open to researchers interested in the work of the AI validation team. Allowing researchers to conduct research within the team contributes to transparency and enables external perspectives and feedback to be incorporated into the team's work.
We have decided to include a researcher in residence as a member of our team.
The researcher in residence takes the following form:
The following conditions apply to the researcher in residence.
Risks around a potential chilling effect (team members not feeling free to express themselves) are mitigated by the conditions we impose. In light of aforementioned form and conditions above, we see no further significant risks.
Including a researcher in residence makes it easier for them to conduct research within both the team and the wider organization where the AI validation team operates. This benefits the quality of the research findings and the feedback provided to the team and organization.
Contact us at ai-validatie@minbzk.nl.
Product Owner
Robbert has been on a mission for over 15 years to enhance the transparency and collaboration within AI projects. Before joining this team, he founded several data science and tech companies (partly) dedicated to this cause. Robbert is passionate about solving complex problems where he connects business needs with technology and involves others in how these solutions can improve their work.
robbertbos
Robbert Bos
Researcher in Residence
Lucas is PhD candidate conducting research into the regulation and governance of algorithmic discrimination by supervision and enforcement organizations. Lucas is our Researcher in Residence.
Lucas Haitsma
rug.nl
Engineer
Berry is a software engineer passionate about problem-solving and system optimization, with expertise in Go, Python, and C++. Specialized in architecting high-volume data processing systems and implementing Lean-Agile and DevOps practices. Experienced in managing end-to-end processes from hardware provisioning to software deployment and release.
berrydenhartog
Berry den Hartog
Engineering Manager
Anne used to be a Machine Learning Engineering Manager at Spotify and previously held roles at DPG Media, Blendle, and Google AI. He holds a PhD from the University of Amsterdam.
anneschuth
Anne Schuth
anneschuth.nl
After graduating in pure mathematics, Christopher transitioned into machine learning. He is passionate about solving complex problems, especially those that have a societal impact. My expertise lies in math, machine learning theory and I'm skilled in Python.
ChristopherSpelt
Christopher Spelt
AI Ethics Lead
Willy specializes in AI governance, AI risk management, AI assurance and ethics-by-design. She is an advocate of AI standards and a member of several ethics committees.
FrieseWoudloper
Willy Tadema
Robbert is a highly enthusiastic full-stack engineer with a Bachelor's degree in Computer Science from the Hanze University of Applied Sciences in Groningen. He is passionate about building secure, compliant, and ethical solutions, and thrives in collaborative environments. Robbert is eager to leverage his skills and knowledge to help shape and propel the future of IT within the government.
uittenbroekrobbert
Robbert Uittenbroek
Laurens is a passionate guy with a love for innovation and doing things differently. With a background in Econometrics and Computer Science he loves to tackle the IT challenges of the Government by helping other people through extensive knowledge sharing on stage, building neural networks himself, or building a strong community.
laurensWe
Laurens Weijs
This document contains a checklist with requirements for tools we could use to help with the transparency of algorithmic decision making.
The requirements are based on:
The requirements have been given a priority based on the MoSCoW scale to allow for tool comparison.
This document assesses standards that standardize the way algorithm assessments can be captured.
There are many algorithm assessments (e.g. IAMA, HUIDERIA, etc.), technical tests on performance (e.g. Accuracy, TP, FP, F1, etc), fairness and bias of algorithms (e.g. SHAP) and reporting formats available. The goal is to have a way of standardizing the way these different assessments and tests can be captured.
The most interesting existing capturing methods seem to be all based on Model Cards for Model Reporting, which are:
\"Short documents accompanying trained machine learning models that provide benchmarked evaluation in a variety of conditions, such as across different cultural, demographic, or phenotypic groups (e.g., race, geographic location, sex, Fitzpatrick skin type) and intersectional groups (e.g., age and race, or sex and Fitzpatrick skin type) that are relevant to the intended application domains. Model cards also disclose the context in which models are intended to be used, details of the performance evaluation procedures, and other relevant information\", proposed by Google. Note that \"The proposed set of sections\" in the Model Cards paper \"are intended to provide relevant details to consider, but are not intended to be complete or exhaustive, and may be tailored depending on the model, context, and stakeholders.\"
Many companies implement their own version of Model Cards, for example Meta System Cards and the tools mentioned in the next section.
There exist tools to (semi)-automatically generate models cards:
A landscape analysis of ML documentation tools has been performed by Hugging Face and provides a good overview of the current landscape.
Another interesting standard is the Algorithmic Transparency Recording Standard of the United Kingdom Goverment, which can be found here.
We need a standard that captures algorithmic assessments and technical tests on model and datasets. The idea of model cards can serve as a guiding theoretical principle on how to implement such a standard. More specifically, we can draw inspiration from the existing model card schema's and implementations of VerifyML and Hugging Face. We note the following:
Hence in any case we need to extend one of these standards. We propose to:
In our ongoing research on AI validation and transparency, we are seeking tools to support assessments. Ideal tools would combine various technical tests with checklists and questionnaires and have the ability to generate reports in both human-friendly and machine-exchangeable formats.
This document contains a list of tools we have found and may want to investigate further.
AI Verify is an AI governance testing framework and software toolkit that validates the performance of AI systems against a set of internationally recognised principles through standardised tests, and is consistent with international AI governance frameworks such as those from European Union, OECD and Singapore.
Links: AI Verify Homepage, AI Verify documentation, AI Verify Github.
What is it? VerifyML is an opinionated, open-source toolkit and workflow to help companies implement human-centric AI practices. It seems pretty much equivalent to AI Verify.
Why interesting? The functionality of this toolkit seems to match closely with those of AI Verify. It has a \"git and code first approach\" and has automatic generation of model cards.
Remarks The code seems to be last updated 2 years ago.
Links: VerifyML, VerifyML GitHub
What is it? Open source Python libraries that supports interpretability and explainability of datasets and machine learning models. Most relevant tookits are the AI Fairness 360 and AI Explainability 360.
Why interesting? Seems to encompass extensive fairness and explainability tests. Codebase seems to be active.
Remarks It comes as Python and R libraries.
Links: AI Fairness 360 Github, AI Explainability 360 Github.
What is it? Open source tool to assess and improve the trustworthiness of AI systems. Offers tools to measure and mitigate bias across numerous tasks. Will be extended to include tools for efficacy, robustness, privacy and explainability.
Why interesting? Although it is not entirely clear what exactly this tool does (see Remarks) it does seem (according to their website) to provide reports on bias and fairness. The Github rep does not seem to include any report generating code, but mainly technical tests. Here is an example in which bias is measured in a classification model.
Remarks Website seems to suggest the possibility to generate reports, but this is not directly reflected in the codebase. Possibly reports are only available with some sort of licenced product?
Links: Hollisticai homepage, Hollisticai Github.
AI Assessment Tool Belgium. The tool is based on the ALTAI recommendations published by the European Commission. Although it only includes questionnaires it does give an interesting way of reporting the end results. Does not include any technical tests at this point.
What-if. Provides interface for expanding understanding of a black-box classifaction or regression ML model. Can be accessed through TensorBoard or as an extension in a Jupyter or Colab notebook. Does not seem to be an active codebase.
Aequitas. Open source bias auditing and Fair ML toolkit. This already seems to be contained within AI Verify, at least the 'fairness tree'.
Facets. Open source toolkit for understanding and analyzing ML datasets. Note that does not include ML models.
Fairness Indicators. Open source Python package which enables easy computation of commonly-identified fairness metrics for binary and multiclass classifiers. Part of TensorFlow. k
Fairlearn. Open source Python package that empowers developers of AI systems to assess their system's fairness and mitigate any observed unfairness issuess.
Dalex. The DALEX package xrays any model and helps to explore and explain its behaviour, helps to understand how complex models are working. The main function explain() creates a wrapper around a predictive model. Wrapped models may then be explored and compared with a collection of local and global explainers. Recent developments from the area of Interpretable Machine Learning/eXplainable Artificial Intelligence.
SigmaRed. SigmaRed platform enables comprehensive third-party AI risk management (AI TPRM) and rapidly reduces the cycle time of conducting AI risks assessments while providing deep visibility, control, stakeholder based reporting, and detailed evidence repository. Does not seem to be open source.
Anch.ai. The end-to-end cloud solution empowers global data-driven organizations to govern and deploy responsible, transparent, and explainable AI aligned with upcoming EU regulation AI Act. Does not seem to be open source.
CredoAI. Credo AI is an AI governance platform that helps companies adopt, scale, and govern AI safely and effectively. Does not seem to be open source.
Paper by TNO about the FATE system. Acronym stands for \"FAir, Transparent and Explainable Decision Making.\"
Tools mentioned include some of the above: Aequitas, AI Fairness 360, Dalex, Fairlean, Responsibly, and What-If-Tool
Links: Paper, Article, Microsoft links.
The purpose of a code review is to ensure the quality, readability, and that all requirements from the ticket have been met for a change before it gets merged into the main codebase. Additionally, code reviews are a communication tool, they allow team members to stay aware of changes being made.
Code reviews involve having a team member examine the changes made by another team member and give feedback or ask questions if needed.
We use GitHub pull requests (PR) for code reviews. You can make a draft PR if your work is still in progress. When you are done you can remove the draft status. A team member may start reviewing when the PR does not have a draft status.
For team ADRs at least 3 accepting reviews are required, or all team members should accept if it can be expected that the ADR is controversial.
A team ADR is an ADR made in the ai-validation repository.
All other PRs only need at least 1 reviewer to get accepted, but can have more reviewers if desired (by either reviewer or author).
By default the codeowner, indicated in the CODEOWNER file, will be requested to review. For us this is the GitHub team AI-validation. If the PR creator wants a specific team member to review, the PR creator should add the team member specifically in the reviewers section of the PR. A message in Mattermost will be posted for PRs. Then with the reaction of an emoji a reviewer will indicate they are looking at the PR.
If the reviewer has suggestions or comments the PR creator can fix those or add comments to the suggestions. When the creator of the PR thinks he is done with the feedback he must re-request a review from the person that did the review. The reviewer must then look at the changes and approve or add more comments. This process continues untill the reviewer agrees that all is correct and approves the PR.
Once the review is approved the reviewer checks if the branch is in sync with the main branch before merging. If not, the reviewer rebases the branch. Once the branch is in sync with main the reviewer merges the PR and checks if the deployment is successful. If the deployment is not successful the reviewer fixes it. If the PR needs more than one review, the last accepting reviewer merges the PR.
First off, thanks for taking the time to contribute! \u2764\ufe0f
All types of contributions are encouraged and valued. See the Table of Contents for different ways to help and details about how this project handles them. Please make sure to read the relevant section before making your contribution. It will make it a lot easier for us maintainers and smooth out the experience for all involved. The community looks forward to your contributions. \ud83c\udf89
This project and everyone participating in it is governed by the Code of Conduct. By participating, you are expected to uphold this code. Please report unacceptable behavior to ai-validatie@minbzk.nl.
Before you ask a question, it is best to search for existing Issues that might help you. In case you have found a suitable issue and still need clarification, you can write your question in this issue.
If you then still feel the need to ask a question and need clarification, we recommend the following:
We will then take care of the issue as soon as possible.
When contributing to this project, you must agree that you have authored 100% of the content, that you have the necessary rights to the content and that the content you contribute may be provided under the project license.
A good bug report shouldn't leave others needing to chase you up for more information. Therefore, we ask you to investigate carefully, collect information and describe the issue in detail in your report. Please complete the following steps in advance to help us fix any potential bug as fast as possible.
You must never report security related issues, vulnerabilities or bugs including sensitive information to the issue tracker, or elsewhere in public. Instead sensitive bugs must be sent by email to ai-validatie@minbzk.nl.
We use GitHub issues to track bugs and errors. If you run into an issue with the project:
Once it's filed:
needs-repro
needs-fix
critical
This section guides you through submitting an enhancement suggestion for this project, including completely new features and minor improvements. Following these guidelines will help maintainers and the community to understand your suggestion and find related suggestions.
Enhancement suggestions are tracked as GitHub issues.
We have commit message conventions: Commit convention
We use markdownlint to standardize markdown. MarkDown lint.
We use Pre-commit to enabled standardization. Pre-commit.
For clarity and consistency, this document defines some terms used within our team where the meaning in Data Science or Computer Science differs, and terms that are for any reason good to mention.
For a full reference for Machine Learning, we recommend ML Fundamentals from Google.
Make sure you have installed Mattermost, then follow these steps.
Make sure you have installed Webex, then follow these steps.
Create or use your existing Github account.
Bookmark these links in your browser:
We are assuming your dev machine is a Mac. This guide is rather opinionated, feel free to have your own opinion, and feel free to contribute! Contributing can be done by clicking \"edit\" top right and by making a pull request on this repository.
Homebrew as the missing Package Manager
/bin/bash -c \"$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)\"\n
Rectange
brew install --cask rectangle\n
WebEx for video conferencing
brew install --cask webex\n
Mattermost for team communication
brew install --cask mattermost\n
Iterm2
brew install --cask iterm2\n
Oh My Zsh
/bin/bash -c \"$(curl -fsSL https://raw.githubusercontent.com/ohmyzsh/ohmyzsh/master/tools/install.sh)\"\n
Autosuggestions for zsh
git clone https://github.com/zsh-users/zsh-autosuggestions ~/.oh-my-zsh/custom/plugins/zsh-autosuggestions\n
Fish shell like syntax highlighting for Zsh
brew install zsh-syntax-highlighting\n
Add plugins to your shell in ~/.zshrc
~/.zshrc
plugins = (\n # other plugins...\n zsh-autosuggestions\n kubectl\n docker\n docker-compose\n pyenv\n z\n)\n
Touch ID in Terminal
Sourcetree
brew install --cask sourcetree\n
Pyenv
brew install pyenv\n
pyenv virtualenv
brew install pyenv-virtualenv\n
Xcode Command Line Tools
xcode-select --install\n
TabbyML Opensource, self-hosted AI coding assistant
We can not just use hosted versions of coding assistants because of privacy and copyright issues. We can however use self-hosted coding assistants provided they are trained on data with permissive licenses.
StarCoder (1-7B) models are all trained on version 1.2 of The Stack dataset. It boils down to all open GitHub code with permissive licenses (193 licenses in total). Minus opt-out requests.
Code Lama and Deepseek models are not clear enough about their data licenses.
brew install tabbyml/tabby/tabby\ntabby serve --device metal --model TabbyML/StarCoder-3B\n
Then configure your IDE by installing a plugin.
Sign commits using SSH