From 4f1fc6bf1bed0d489433ea428c8323d6eab1f18a Mon Sep 17 00:00:00 2001 From: pjbull Date: Mon, 1 Jul 2024 04:40:14 +0000 Subject: [PATCH] Add rendered files from build --- README.md | 29 ++++++++++++++------------ docs/docs/examples.md | 48 +++++++++++++++++++++---------------------- docs/docs/index.md | 29 ++++++++++++++------------ 3 files changed, 56 insertions(+), 50 deletions(-) diff --git a/README.md b/README.md index 99ae008..c094c1c 100644 --- a/README.md +++ b/README.md @@ -43,27 +43,30 @@ Dig into the checklist questions to identify and navigate the ethical considerat For more configuration details, see the sections on [command line options](#command-line-options), [supported output file types](#supported-file-types), and [custom checklists](#custom-checklists). -# Background and perspective +# What is `deon` designed to do? -We have a particular perspective with this package that we will use to make decisions about contributions, issues, PRs, and other maintenance and support activities. +We created `deon` to help data scientists across the sector be more intentional in their choices, and more aware of the ethical implications of their work. We use that perspective to make decisions about contributions, issues, PRs, and other maintenance and support activities. -First and foremost, our goal is not to be arbitrators of what ethical concerns merit inclusion. We have a [process for changing the default checklist](#changing-the-checklist), but we believe that many domain-specific concerns are not included and teams will benefit from developing [custom checklists](#custom-checklists). Not every checklist item will be relevant. We encourage teams to remove items, sections, or mark items as `N/A` as the concerns of their projects dictate. -Second, we built our initial list from a set of proposed items on [multiple checklists that we referenced](#checklist-citations). This checklist was heavily inspired by an article written by Mike Loukides, Hilary Mason, and DJ Patil and published by O'Reilly: ["Of Oaths and Checklists"](https://www.oreilly.com/ideas/of-oaths-and-checklists). We owe a great debt to the thinking that proceeded this, and we look forward to thoughtful engagement with the ongoing discussion about checklists for data science ethics. +1. πŸ”“ **Our goal is not to be arbitrators of what ethical concerns merit inclusion**. We have a [process for changing the default checklist](#changing-the-checklist), but we believe that many domain-specific concerns are not included and teams will benefit from developing [custom checklists](#custom-checklists). Not every checklist item will be relevant. We encourage teams to remove items, sections, or mark items as `N/A` as the concerns of their projects dictate. -Third, we believe in the power of examples to bring the principles of data ethics to bear on human experience. This repository includes a [list of real-world examples](http://deon.drivendata.org/examples/) connected with each item in the default checklist. We encourage you to contribute relevant use cases that you believe can benefit the community by their example. In addition, if you have a topic, idea, or comment that doesn't seem right for the documentation, please add it to the [wiki page](https://github.com/drivendataorg/deon/wiki) for this project! +2. πŸ“Š This checklist is designed to provoke conversations around **issues where data scientists have particular responsibility and perspective**. It's not up to data scientists alone to decide what the ethical course of action is. This has always been a responsibility of organizations that are part of civil society. Conversations should be part of a larger organizational commitment to doing what is right. -Fourth, it's not up to data scientists alone to decide what the ethical course of action is. This has always been a responsibility of organizations that are part of civil society. This checklist is designed to provoke conversations around issues where data scientists have particular responsibility and perspective. This conversation should be part of a larger organizational commitment to doing what is right. +3. πŸ’¬ Items on the checklist are **meant to provoke discussion** among good-faith actors who take their ethical responsibilities seriously. We are working at a level of abstraction that cannot concretely recommend a specific action (e.g., "remove variable X from your model"). Because of this, most of the items are framed as prompts to discuss or consider. Teams will want to document these discussions and decisions for posterity. -Fifth, we believe the primary benefit of a checklist is ensuring that we don't overlook important work. Sometimes it is difficult with pressing deadlines and a demand to multitask to make sure we do the hard work to think about the big picture. This package is meant to help ensure that those discussions happen, even in fast-moving environments. Ethics is hard, and we expect some of the conversations that arise from this checklist may also be hard. +4. 🌎 We believe in the **power of examples** to bring the principles of data ethics to bear on human experience. This repository includes a [list of real-world examples](http://deon.drivendata.org/examples/) connected with each item in the default checklist. We encourage you to contribute relevant use cases that you believe can benefit the community by their example. In addition, if you have a topic, idea, or comment that doesn't seem right for the documentation, please add it to the [wiki page](https://github.com/drivendataorg/deon/wiki) for this project! -Sixth, we are working at a level of abstraction that cannot concretely recommend a specific action (e.g., "remove variable X from your model"). Nearly all of the items on the checklist are meant to provoke discussion among good-faith actors who take their ethical responsibilities seriously. Because of this, most of the items are framed as prompts to discuss or consider. Teams will want to document these discussions and decisions for posterity. +5. πŸ” We believe the primary benefit of a checklist is **ensuring that we don't overlook important work**. Sometimes it is difficult with pressing deadlines and a demand to multitask to make sure we do the hard work to think about the big picture. This package is meant to help ensure that those discussions happen, even in fast-moving environments. -Seventh, we can't define exhaustively every term that appears in the checklist. Some of these terms are open to interpretation or mean different things in different contexts. We recommend that when relevant, users create their own glossary for reference. +6. ❓ We can't define exhaustively every term that appears in the checklist. Some of these **terms are open to interpretation** or mean different things in different contexts. We recommend that when relevant, users create their own glossary for reference. -Eighth, we want to avoid any items that strictly fall into the realm of statistical best practices. Instead, we want to highlight the areas where we need to pay particular attention above and beyond best practices. +7. ✨ We want to avoid any items that strictly fall into the realm of statistical best practices. Instead, we want to highlight the areas where we need to pay particular attention **above and beyond best practices**. -Ninth, we want all the checklist items to be as simple as possible (but no simpler), and to be actionable. +8. βœ… We want all the checklist items to be **as simple as possible** (but no simpler), and to be actionable. + +## Sources + +We built our initial list from a set of proposed items on [multiple checklists that we referenced](#checklist-citations). This checklist was heavily inspired by an article written by Mike Loukides, Hilary Mason, and DJ Patil and published by O'Reilly: ["Of Oaths and Checklists"](https://www.oreilly.com/ideas/of-oaths-and-checklists). We owe a great debt to the thinking that proceeded this, and we look forward to thoughtful engagement with the ongoing discussion about checklists for data science ethics. # Using this tool @@ -266,9 +269,9 @@ We're excited to see so many articles popping up on data ethics! The short list - [Technology is biased too. How do we fix it?](https://fivethirtyeight.com/features/technology-is-biased-too-how-do-we-fix-it/) - [The dark secret at the heart of AI](https://www.technologyreview.com/s/604087/the-dark-secret-at-the-heart-of-ai/) -## Where things have gone wrong +## Data ethics in the real world -To make the ideas contained in the checklist more concrete, we've compiled [examples](http://deon.drivendata.org/examples/) of times when things have gone wrong. They're paired with the checklist questions to help illuminate where in the process ethics discussions may have helped provide a course correction. +To make the ideas contained in the checklist more concrete, we've compiled [examples](http://deon.drivendata.org/examples/) of times when tradoffs were handled well, and times when things have gone wrong. They're paired with the checklist questions to help illuminate where in the process ethics discussions may have helped provide a course correction. We welcome contributions! Follow [these instructions](https://github.com/drivendataorg/deon/blob/main/CONTRIBUTING.md) to add an example. diff --git a/docs/docs/examples.md b/docs/docs/examples.md index 621f332..c1ee2fc 100644 --- a/docs/docs/examples.md +++ b/docs/docs/examples.md @@ -1,34 +1,34 @@
 
-# Where things have gone wrong +# Data ethics in the real world -To make the ideas contained in the checklist more concrete, we've compiled examples of times when things have gone wrong. They're paired with the checklist questions to help illuminate where in the process ethics discussions may have helped provide a course correction. +To make the ideas contained in the checklist more concrete, we've compiled **examples** of times when tradoffs were handled well, and times when things have gone wrong. Examples are paired with the checklist questions to help illuminate where in the process ethics discussions may have helped provide a course correction. Positive examples show how principles of `deon` can be followed in the real world. -
Checklist Question
|
Examples of Ethical Issues
+
Checklist Question
|
Examples
--- | --- |
**Data Collection**
-**A.1 Informed consent**: If there are human subjects, have they given informed consent, where subjects affirmatively opt-in and have a clear understanding of the data uses to which they consent? | -**A.2 Collection bias**: Have we considered sources of bias that could be introduced during data collection and survey design and taken steps to mitigate those? | -**A.3 Limit PII exposure**: Have we considered ways to minimize exposure of personally identifiable information (PII) for example through anonymization or not collecting information that isn't relevant for analysis? | -**A.4 Downstream bias mitigation**: Have we considered ways to enable testing downstream results for biased outcomes (e.g., collecting data on protected group status like race or gender)? | +**A.1 Informed consent**: If there are human subjects, have they given informed consent, where subjects affirmatively opt-in and have a clear understanding of the data uses to which they consent? | +**A.2 Collection bias**: Have we considered sources of bias that could be introduced during data collection and survey design and taken steps to mitigate those? | +**A.3 Limit PII exposure**: Have we considered ways to minimize exposure of personally identifiable information (PII) for example through anonymization or not collecting information that isn't relevant for analysis? | +**A.4 Downstream bias mitigation**: Have we considered ways to enable testing downstream results for biased outcomes (e.g., collecting data on protected group status like race or gender)? | |
**Data Storage**
-**B.1 Data security**: Do we have a plan to protect and secure data (e.g., encryption at rest and in transit, access controls on internal users and third parties, access logs, and up-to-date software)? | -**B.2 Right to be forgotten**: Do we have a mechanism through which an individual can request their personal information be removed? | -**B.3 Data retention plan**: Is there a schedule or plan to delete the data after it is no longer needed? | +**B.1 Data security**: Do we have a plan to protect and secure data (e.g., encryption at rest and in transit, access controls on internal users and third parties, access logs, and up-to-date software)? | +**B.2 Right to be forgotten**: Do we have a mechanism through which an individual can request their personal information be removed? | +**B.3 Data retention plan**: Is there a schedule or plan to delete the data after it is no longer needed? | |
**Analysis**
-**C.1 Missing perspectives**: Have we sought to address blindspots in the analysis through engagement with relevant stakeholders (e.g., checking assumptions and discussing implications with affected communities and subject matter experts)? | -**C.2 Dataset bias**: Have we examined the data for possible sources of bias and taken steps to mitigate or address these biases (e.g., stereotype perpetuation, confirmation bias, imbalanced classes, or omitted confounding variables)? | -**C.3 Honest representation**: Are our visualizations, summary statistics, and reports designed to honestly represent the underlying data? | -**C.4 Privacy in analysis**: Have we ensured that data with PII are not used or displayed unless necessary for the analysis? | -**C.5 Auditability**: Is the process of generating the analysis well documented and reproducible if we discover issues in the future? | +**C.1 Missing perspectives**: Have we sought to address blindspots in the analysis through engagement with relevant stakeholders (e.g., checking assumptions and discussing implications with affected communities and subject matter experts)? | +**C.2 Dataset bias**: Have we examined the data for possible sources of bias and taken steps to mitigate or address these biases (e.g., stereotype perpetuation, confirmation bias, imbalanced classes, or omitted confounding variables)? | +**C.3 Honest representation**: Are our visualizations, summary statistics, and reports designed to honestly represent the underlying data? | +**C.4 Privacy in analysis**: Have we ensured that data with PII are not used or displayed unless necessary for the analysis? | +**C.5 Auditability**: Is the process of generating the analysis well documented and reproducible if we discover issues in the future? | |
**Modeling**
-**D.1 Proxy discrimination**: Have we ensured that the model does not rely on variables or proxies for variables that are unfairly discriminatory? | -**D.2 Fairness across groups**: Have we tested model results for fairness with respect to different affected groups (e.g., tested for disparate error rates)? | -**D.3 Metric selection**: Have we considered the effects of optimizing for our defined metrics and considered additional metrics? | -**D.4 Explainability**: Can we explain in understandable terms a decision the model made in cases where a justification is needed? | -**D.5 Communicate limitations**: Have we communicated the shortcomings, limitations, and biases of the model to relevant stakeholders in ways that can be generally understood? | +**D.1 Proxy discrimination**: Have we ensured that the model does not rely on variables or proxies for variables that are unfairly discriminatory? | +**D.2 Fairness across groups**: Have we tested model results for fairness with respect to different affected groups (e.g., tested for disparate error rates)? | +**D.3 Metric selection**: Have we considered the effects of optimizing for our defined metrics and considered additional metrics? | +**D.4 Explainability**: Can we explain in understandable terms a decision the model made in cases where a justification is needed? | +**D.5 Communicate limitations**: Have we communicated the shortcomings, limitations, and biases of the model to relevant stakeholders in ways that can be generally understood? | |
**Deployment**
-**E.1 Monitoring and evaluation**: Do we have a clear plan to monitor the model and its impacts after it is deployed (e.g., performance monitoring, regular audit of sample predictions, human review of high-stakes decisions, reviewing downstream impacts of errors or low-confidence decisions, testing for concept drift)? | -**E.2 Redress**: Have we discussed with our organization a plan for response if users are harmed by the results (e.g., how does the data science team evaluate these cases and update analysis and models to prevent future harm)? | -**E.3 Roll back**: Is there a way to turn off or roll back the model in production if necessary? | -**E.4 Unintended use**: Have we taken steps to identify and prevent unintended uses and abuse of the model and do we have a plan to monitor these once the model is deployed? | +**E.1 Monitoring and evaluation**: Do we have a clear plan to monitor the model and its impacts after it is deployed (e.g., performance monitoring, regular audit of sample predictions, human review of high-stakes decisions, reviewing downstream impacts of errors or low-confidence decisions, testing for concept drift)? | +**E.2 Redress**: Have we discussed with our organization a plan for response if users are harmed by the results (e.g., how does the data science team evaluate these cases and update analysis and models to prevent future harm)? | +**E.3 Roll back**: Is there a way to turn off or roll back the model in production if necessary? | +**E.4 Unintended use**: Have we taken steps to identify and prevent unintended uses and abuse of the model and do we have a plan to monitor these once the model is deployed? | diff --git a/docs/docs/index.md b/docs/docs/index.md index c597afe..a8f75db 100644 --- a/docs/docs/index.md +++ b/docs/docs/index.md @@ -36,27 +36,30 @@ Dig into the checklist questions to identify and navigate the ethical considerat For more configuration details, see the sections on [command line options](#command-line-options), [supported output file types](#supported-file-types), and [custom checklists](#custom-checklists). -# Background and perspective +# What is `deon` designed to do? -We have a particular perspective with this package that we will use to make decisions about contributions, issues, PRs, and other maintenance and support activities. +We created `deon` to help data scientists across the sector be more intentional in their choices, and more aware of the ethical implications of their work. We use that perspective to make decisions about contributions, issues, PRs, and other maintenance and support activities. -First and foremost, our goal is not to be arbitrators of what ethical concerns merit inclusion. We have a [process for changing the default checklist](#changing-the-checklist), but we believe that many domain-specific concerns are not included and teams will benefit from developing [custom checklists](#custom-checklists). Not every checklist item will be relevant. We encourage teams to remove items, sections, or mark items as `N/A` as the concerns of their projects dictate. -Second, we built our initial list from a set of proposed items on [multiple checklists that we referenced](#checklist-citations). This checklist was heavily inspired by an article written by Mike Loukides, Hilary Mason, and DJ Patil and published by O'Reilly: ["Of Oaths and Checklists"](https://www.oreilly.com/ideas/of-oaths-and-checklists). We owe a great debt to the thinking that proceeded this, and we look forward to thoughtful engagement with the ongoing discussion about checklists for data science ethics. +1. πŸ”“ **Our goal is not to be arbitrators of what ethical concerns merit inclusion**. We have a [process for changing the default checklist](#changing-the-checklist), but we believe that many domain-specific concerns are not included and teams will benefit from developing [custom checklists](#custom-checklists). Not every checklist item will be relevant. We encourage teams to remove items, sections, or mark items as `N/A` as the concerns of their projects dictate. -Third, we believe in the power of examples to bring the principles of data ethics to bear on human experience. This repository includes a [list of real-world examples](http://deon.drivendata.org/examples/) connected with each item in the default checklist. We encourage you to contribute relevant use cases that you believe can benefit the community by their example. In addition, if you have a topic, idea, or comment that doesn't seem right for the documentation, please add it to the [wiki page](https://github.com/drivendataorg/deon/wiki) for this project! +2. πŸ“Š This checklist is designed to provoke conversations around **issues where data scientists have particular responsibility and perspective**. It's not up to data scientists alone to decide what the ethical course of action is. This has always been a responsibility of organizations that are part of civil society. Conversations should be part of a larger organizational commitment to doing what is right. -Fourth, it's not up to data scientists alone to decide what the ethical course of action is. This has always been a responsibility of organizations that are part of civil society. This checklist is designed to provoke conversations around issues where data scientists have particular responsibility and perspective. This conversation should be part of a larger organizational commitment to doing what is right. +3. πŸ’¬ Items on the checklist are **meant to provoke discussion** among good-faith actors who take their ethical responsibilities seriously. We are working at a level of abstraction that cannot concretely recommend a specific action (e.g., "remove variable X from your model"). Because of this, most of the items are framed as prompts to discuss or consider. Teams will want to document these discussions and decisions for posterity. -Fifth, we believe the primary benefit of a checklist is ensuring that we don't overlook important work. Sometimes it is difficult with pressing deadlines and a demand to multitask to make sure we do the hard work to think about the big picture. This package is meant to help ensure that those discussions happen, even in fast-moving environments. Ethics is hard, and we expect some of the conversations that arise from this checklist may also be hard. +4. 🌎 We believe in the **power of examples** to bring the principles of data ethics to bear on human experience. This repository includes a [list of real-world examples](http://deon.drivendata.org/examples/) connected with each item in the default checklist. We encourage you to contribute relevant use cases that you believe can benefit the community by their example. In addition, if you have a topic, idea, or comment that doesn't seem right for the documentation, please add it to the [wiki page](https://github.com/drivendataorg/deon/wiki) for this project! -Sixth, we are working at a level of abstraction that cannot concretely recommend a specific action (e.g., "remove variable X from your model"). Nearly all of the items on the checklist are meant to provoke discussion among good-faith actors who take their ethical responsibilities seriously. Because of this, most of the items are framed as prompts to discuss or consider. Teams will want to document these discussions and decisions for posterity. +5. πŸ” We believe the primary benefit of a checklist is **ensuring that we don't overlook important work**. Sometimes it is difficult with pressing deadlines and a demand to multitask to make sure we do the hard work to think about the big picture. This package is meant to help ensure that those discussions happen, even in fast-moving environments. -Seventh, we can't define exhaustively every term that appears in the checklist. Some of these terms are open to interpretation or mean different things in different contexts. We recommend that when relevant, users create their own glossary for reference. +6. ❓ We can't define exhaustively every term that appears in the checklist. Some of these **terms are open to interpretation** or mean different things in different contexts. We recommend that when relevant, users create their own glossary for reference. -Eighth, we want to avoid any items that strictly fall into the realm of statistical best practices. Instead, we want to highlight the areas where we need to pay particular attention above and beyond best practices. +7. ✨ We want to avoid any items that strictly fall into the realm of statistical best practices. Instead, we want to highlight the areas where we need to pay particular attention **above and beyond best practices**. -Ninth, we want all the checklist items to be as simple as possible (but no simpler), and to be actionable. +8. βœ… We want all the checklist items to be **as simple as possible** (but no simpler), and to be actionable. + +## Sources + +We built our initial list from a set of proposed items on [multiple checklists that we referenced](#checklist-citations). This checklist was heavily inspired by an article written by Mike Loukides, Hilary Mason, and DJ Patil and published by O'Reilly: ["Of Oaths and Checklists"](https://www.oreilly.com/ideas/of-oaths-and-checklists). We owe a great debt to the thinking that proceeded this, and we look forward to thoughtful engagement with the ongoing discussion about checklists for data science ethics. # Using this tool @@ -259,9 +262,9 @@ We're excited to see so many articles popping up on data ethics! The short list - [Technology is biased too. How do we fix it?](https://fivethirtyeight.com/features/technology-is-biased-too-how-do-we-fix-it/) - [The dark secret at the heart of AI](https://www.technologyreview.com/s/604087/the-dark-secret-at-the-heart-of-ai/) -## Where things have gone wrong +## Data ethics in the real world -To make the ideas contained in the checklist more concrete, we've compiled [examples](http://deon.drivendata.org/examples/) of times when things have gone wrong. They're paired with the checklist questions to help illuminate where in the process ethics discussions may have helped provide a course correction. +To make the ideas contained in the checklist more concrete, we've compiled [examples](http://deon.drivendata.org/examples/) of times when tradoffs were handled well, and times when things have gone wrong. They're paired with the checklist questions to help illuminate where in the process ethics discussions may have helped provide a course correction. We welcome contributions! Follow [these instructions](https://github.com/drivendataorg/deon/blob/main/CONTRIBUTING.md) to add an example.