Merge pull request #662 from BethanyJep/main

Updated content on Fairness lesson to reflect responsible AI
pull/668/head
Carlotta Castelluccio 1 year ago committed by GitHub
commit 5bb2b6da5c
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -1,4 +1,4 @@
# Practicing responsible AI in Machine Learning
# Building Machine Learning solutions with responsible AI
![Summary of responsible AI in Machine Learning in a sketchnote](../../sketchnotes/ml-fairness.png)
> Sketchnote by [Tomomi Imura](https://www.twitter.com/girlie_mac)
@ -13,9 +13,12 @@ Imagine what can happen when the data you are using to build these models lacks
In this lesson, you will:
- Raise your awareness of the importance of fairness in machine learning.
- Learn about fairness-related harms.
- Learn about unfairness assessment and mitigation.
- Raise your awareness of the importance of fairness in machine learning and fairness-related harms.
- Become familiar with the practice of exploring outliers and unusual scenarios to ensure reliability and safety
- Gain understanding on the need to empower everyone by designing inclusive systems
- Explore how vital it is to protect privacy and security of data and people
- See the importance of having a glass box approach to explain the behavior of AI models
- Be mindful of how accountability is essential to build trust in AI systems
## Prerequisite
@ -34,35 +37,36 @@ AI systems should treat everyone fairly and avoid affecting similar groups of pe
**“Unfairness”** encompasses negative impacts, or “harms”, for a group of people, such as those defined in terms of race, gender, age, or disability status. The main fairness-related harms can be classified as:
- **Allocation**, if a gender or ethnicity for example is favored over another.
- **Quality of service**. If you train the data for one specific scenario but reality is much more complex, it leads to a poor performing service.
- **Stereotyping**. Associating a given group with pre-assigned attributes.
- **Denigration**. To unfairly criticize and label something or someone.
- **Quality of service**. If you train the data for one specific scenario but reality is much more complex, it leads to a poor performing service. For instance, a hand soap dispenser that could not seem to be able to sense people with dark skin. [Reference](https://gizmodo.com/why-cant-this-soap-dispenser-identify-dark-skin-1797931773)
- **Denigration**. To unfairly criticize and label something or someone. For example, an image labeling technology infamously mislabeled images of dark-skinned people as gorillas.
- **Over- or under- representation**. The idea is that a certain group is not seen in a certain profession, and any service or function that keeps promoting that is contributing to harm.
- **Stereotyping**. Associating a given group with pre-assigned attributes. For example, a language translation system betweem English and Turkish may have inaccuraces due to words with stereotypical associations to gender.
When designing and testing AI systems, we need to ensure that AI is fair and not programmed to make biased or discriminatory decisions, which human beings are also prohibited from making. Guaranteeing fairness in AI and machine learning remains a complex sociotechnical challenge.
![translation to Turkish](images/gender-bias-translate-en-tr.png)
> translation to Turkish
### Reliability and safety
![translation back to English](images/gender-bias-translate-tr-en.png)
> translation back to English
To build trust, AI systems need to be reliable, safe, and consistent under normal and unexpected conditions. It is important to know how AI systems will behavior in a variety of situations, especially when they are outliers. When building AI solutions, there needs to be a substantial amount of focus on how to handle a wide variety of circumstances that the AI solutions would encounter.
When designing and testing AI systems, we need to ensure that AI is fair and not programmed to make biased or discriminatory decisions, which human beings are also prohibited from making. Guaranteeing fairness in AI and machine learning remains a complex sociotechnical challenge.
For example, a self-driving car needs to put people's safety as a top priority. As a result, the AI powering the car need to consider all the possible scenarios that the car could come across such as night, thunderstorms or blizzards, kids running across the street, pets, road constructions etc. How well an AI system can handle a wild range of conditions reliably and safely reflects the level of anticipation the data scientist or AI developer considered during the design or testing of the system.
### Reliability and safety
<!-- [![Implementing reliability & safety in AI ](https://img.youtube.com/vi/dnC8-uUZXSc/0.jpg)](https://youtu.be/dnC8-uUZXSc "Microsoft's Approach to Responsible AI")
To build trust, AI systems need to be reliable, safe, and consistent under normal and unexpected conditions. It is important to know how AI systems will behavior in a variety of situations, especially when they are outliers. When building AI solutions, there needs to be a substantial amount of focus on how to handle a wide variety of circumstances that the AI solutions would encounter. For example, a self-driving car needs to put people's safety as a top priority. As a result, the AI powering the car need to consider all the possible scenarios that the car could come across such as night, thunderstorms or blizzards, kids running across the street, pets, road constructions etc. How well an AI system can handle a wild range of conditions reliably and safely reflects the level of anticipation the data scientist or AI developer considered during the design or testing of the system.
> 🎥 Click the image above for a video: Ensure reliability and safety in AI -->
> [🎥 Click the here for a video: ](https://www.microsoft.com/videoplayer/embed/RE4vvIl)
### Inclusiveness
AI systems should be designed to engage and empower everyone. When designing and implementing AI systems data scientists and AI developers identify and address potential barriers in the system that could unintentionally exclude people. For example, there are 1 billion people with disabilities around the world. With the advancement of AI, they can access a wide range of information and opportunities more easily in their daily lives. By addressing the barriers, it creates opportunities to innovate and develop AI products with better experiences that benefit everyone.
![Inclusive systems for accessibility](images/accessibility.png)
> Inclusive systems for accessibility
> [🎥 Click the here for a video: inclusiveness in AI](https://www.microsoft.com/videoplayer/embed/RE4vl9v)
### Security and privacy
AI systems should be safe and respect peoples privacy. People have less trust in systems that put their privacy, information, or lives at risk. When training machine learning models, we rely on data to produce the best results. In doing so, the origin of the data and integrity must be considered. For example, was the data user submitted or publicly available?
AI systems should be safe and respect peoples privacy. People have less trust in systems that put their privacy, information, or lives at risk. When training machine learning models, we rely on data to produce the best results. In doing so, the origin of the data and integrity must be considered. For example, was the data user submitted or publicly available? Next, while working with the data, it is crucial to develop AI systems that can protect confidential information and resist attacks. As AI becomes more prevalent, protecting privacy and securing important personal and business information is becoming more critical and complex. Privacy and data security issues require especially close attention for AI because access to data is essential for AI systems to make accurate and informed predictions and decisions about people.
Next, while working with the data, it is crucial to develop AI systems that can protect confidential information and resist attacks. As AI becomes more prevalent, protecting privacy and securing important personal and business information is becoming more critical and complex. Privacy and data security issues require especially close attention for AI because access to data is essential for AI systems to make accurate and informed predictions and decisions about people.
> [🎥 Click the here for a video: security in AI](https://www.microsoft.com/videoplayer/embed/RE4voJF)
- As an industry we have made significant advancements in Privacy & security, fueled significantly by regulations like the GDPR (General Data Protection Regulation).
- Yet with AI systems we must acknowledge the tension between the need for more personal data to make systems more personal and effective and privacy.
@ -72,9 +76,9 @@ Next, while working with the data, it is crucial to develop AI systems that can
### Transparency
AI systems should be understandable. A crucial part of transparency is explaining the behavior of AI systems and their components. Improving the understanding of AI systems requires that stakeholders comprehend how and why they function so that they can identify potential performance issues, safety and privacy concerns, biases, exclusionary practices, or unintended outcomes. We also believe that those who use AI systems should be honest and forthcoming about when, why, and how they choose to deploy them. As well as the limitations of the systems they use.
AI systems should be understandable. A crucial part of transparency is explaining the behavior of AI systems and their components. Improving the understanding of AI systems requires that stakeholders comprehend how and why they function so that they can identify potential performance issues, safety and privacy concerns, biases, exclusionary practices, or unintended outcomes. We also believe that those who use AI systems should be honest and forthcoming about when, why, and how they choose to deploy them. As well as the limitations of the systems they use. For example, if a bank uses an AI system to support its consumer lending decisions, it is important to examine the outcomes and understand which data influences the systems recommendations. Governments are starting to regulate AI across industries, so data scientists and organizations must explain if an AI system meets regulatory requirements, especially when there is an undesirable outcome.
For example, if a bank uses an AI system to support its consumer lending decisions, it is important to examine the outcomes and understand which data influences the systems recommendations. Governments are starting to regulate AI across industries, so data scientists and organizations must explain if an AI system meets regulatory requirements, especially when there is an undesirable outcome.
> [🎥 Click the here for a video: transparency in AI](https://www.microsoft.com/videoplayer/embed/RE4voJF)
- Because AI systems are so complex, it is hard to understand how they work and interpret the results.
- This lack of understanding affects the way these systems are managed, operationalized, and documented.
@ -88,141 +92,39 @@ The people who design and deploy AI systems must be accountable for how their sy
> 🎥 Click the image above for a video: Warnings of Mass Surveillance Through Facial Recognition
One of the biggest questions for our generation, as the first generation that is bringing AI to society, is how to ensure that computers will remain accountable to people and how to ensure that the people that design computers remain accountable to everyone else.
Let us look at the examples.
#### Allocation
Consider a hypothetical system for screening loan applications. The system tends to pick white men as better candidates over other groups. As a result, loans are withheld from certain applicants.
Another example would be an experimental hiring tool developed by a large corporation to screen candidates. The tool systemically discriminated against one gender by using the models were trained to prefer words associated with another. It resulted in penalizing candidates whose resumes contain words such as “womens rugby team”.
✅ Do a little research to find a real-world example of something like this.
#### Quality of Service
Researchers found that several commercial gender classifiers had higher error rates around images of women with darker skin tones as opposed to images of men with lighter skin tones. [Reference](https://www.media.mit.edu/publications/gender-shades-intersectional-accuracy-disparities-in-commercial-gender-classification/)
Another infamous example is a hand soap dispenser that could not seem to be able to sense people with dark skin. [Reference](https://gizmodo.com/why-cant-this-soap-dispenser-identify-dark-skin-1797931773)
Ultimately one of the biggest questions for our generation, as the first generation that is bringing AI to society, is how to ensure that computers will remain accountable to people and how to ensure that the people that design computers remain accountable to everyone else.
#### Stereotyping
A stereotypical gender view was found in machine translation. When translating “he is a nurse and she is a doctor” into Turkish, problems were encountered. Turkish is a genderless language which has one pronoun, “o” to convey a singular third person, but translating the sentence back from Turkish to English yields the stereotypical and incorrect as “she is a nurse, and he is a doctor.”
## Impact assessment
![translation to Turkish](images/gender-bias-translate-en-tr.png)
> translation to Turkish
![translation back to English](images/gender-bias-translate-tr-en.png)
> translation back to English
#### Denigration
An image labeling technology infamously mislabeled images of dark-skinned people as gorillas. Mislabeling is harmful not just because the system made a mistake because it specifically applied a label that has a long history of being purposefully used to denigrate Black people.
[![AI: Ain't I a Woman?](https://img.youtube.com/vi/QxuyfWoVV98/0.jpg)](https://www.youtube.com/watch?v=QxuyfWoVV98 "AI, Ain't I a Woman?")
> 🎥 Click the image above for a video: AI, Ain't I a Woman - a performance showing the harm caused by racist denigration by AI
Before training a machine learning model, it is important to conduct an impact assessmet to understand the purpose of the AI system; what the intended use is; where it will be deployed; and who will be interacting with the system. These are helpful for reviewer(s) or testers evaluating the system to know what factors to take into consideration when identifying potential risks and expected consequences.
#### Over-representation or under-representation
Skewed image search results can be a good example of this harm. When searching images of professions with an equal or higher percentage of men than women, such as engineering, or CEO, watch for results that are more heavily skewed towards a given gender.
The following are areas of focus when conducting an impact assessment:
![Bing search for 'CEO'](images/ceos.png)
> This search on Bing for CEO produces inclusive results
* **Adverse impact on individuals**. Being aware of any restriction or requirements, unsupported use or any known limitations hindering the system's performance is vital to ensure that the system is not used in a way that could cause harm to individuals.
* **Data requirements**. Gaining an understanding of how and where the system will use data enables reviewers to explore any data requirements you would need to be mindful of (e.g., GDPR or HIPPA data regulations). In addition, examine whether the source or quantity of data is substantial for training.
* **Summary of impact**. Gather a list of potential harms that could arise from using the system. Throughout the ML lifecycle, review if the issues identified are mitigated or addressed.
* **Applicable goals** for each of the six core principles. Assess if the goals from each of the principles are met and if there are any gaps.
These five main types of harm are not mutually exclusive, and a single system can exhibit more than one type of harm. In addition, each case varies in its severity. For instance, unfairly labeling someone as a criminal is a much more severe harm than mislabeling an image. It is important, however, to remember that even relatively non-severe harms can make people feel alienated or singled out and the cumulative impact can be extremely oppressive.
**Discussion**: Revisit some of the examples and see if they show different harms.
## Debugging with responsible AI
| | Allocation | Quality of service | Stereotyping | Denigration | Over- or under- representation |
| ----------------------- | :--------: | :----------------: | :----------: | :---------: | :----------------------------: |
| Automated hiring system | x | x | x | | x |
| Machine translation | | | | | |
| Photo labeling | | | | | |
## Detecting unfairness
There are many reasons why a given system behaves unfairly. Social biases, for example, might be reflected in the datasets used to train them. For example, hiring unfairness might have been exacerbated by over reliance on historical data. By using the patterns in resumes submitted to the company over a 10-year period, the model determined that men were more qualified because many resumes came from men, a reflection of past male dominance across the tech industry.
Inadequate data about a certain group of people can be the reason for unfairness. For example, image classifiers have a higher rate of error for images of dark-skinned people because darker skin tones were underrepresented in the data.
Wrong assumptions made during development cause unfairness too. For example, a facial analysis system intended to predict who is going to commit a crime based on images of peoples faces can lead to damaging assumptions. This could lead to substantial harm for people who are misclassified.
## Understand your models and build in fairness
Although many aspects of fairness are not captured in quantitative fairness metrics, and it is not possible to fully remove bias from a system to guarantee fairness, you are still responsible to detect and to mitigate fairness issues as much as possible.
Similar to debugging a software application, debugging an AI system is a necessary process of identifying and resolving issues in the system. There are many factors that would affect a model not performing as expected or responsibly. Most traditional model performance metrics are quantitative aggregates of a model's performance, which are not sufficient to analyze how a model violates the responsible AI principles. Furthermore, a machine learning model is a black box that makes it difficult to understand what drives its outcome or provide explanation when it makes a mistake. Later in this course, we will learn how to use the Responsible AI dashboard to help debug AI systems. The dashboard provides a holistic tool for data scientists and AI developers to perform:
When you are working with machine learning models, it is important to understand your models by means of assuring their interpretability and by assessing and mitigating unfairness.
* **Error analysis**. To identify the error distribution of the model that can affect the system's fairness or reliability.
* **Model overview**. To discover where there are disparities in the model's performance across data cohorts.
* **Data analysis**. To understand the data distribution and identify any potential bias in the data that could lead to fairness, inclusiveness, and reliability issues.
* **Model interpretability**. To understand what affects or influences the model's predictions. This helps in explaining the model's behavior, which is important for transparency and accountability.
Lets use the loan selection example to isolate the case to figure out each factor's level of impact on the prediction.
## Assessment methods
1. **Identify harms (and benefits)**. The first step is to identify harms and benefits. Think about how actions and decisions can affect both potential customers and a business itself.
1. **Identify the affected groups**. Once you understand what kind of harms or benefits that can occur, identify the groups that may be affected. Are these groups defined by gender, ethnicity, or social group?
1. **Define fairness metrics**. Finally, define a metric so you have something to measure against in your work to improve the situation.
### Identify harms (and benefits)
What are the harms and benefits associated with lending? Think about false negatives and false positive scenarios:
**False negatives** (reject, but Y=1) - in this case, an applicant who will be capable of repaying a loan is rejected. This is an adverse event because the resources of the loans are withheld from qualified applicants.
**False positives** (accept, but Y=0) - in this case, the applicant does get a loan but eventually defaults. As a result, the applicant's case will be sent to a debt collection agency which can affect their future loan applications.
### Identify affected groups
The next step is to determine which groups are likely to be affected. For example, in case of a credit card application, a model might determine that women should receive much lower credit limits compared with their spouses who share household assets. An entire demographic, defined by gender, is thereby affected.
### Define fairness metrics
You have identified harms and an affected group, in this case, delineated by gender. Now, use the quantified factors to disaggregate their metrics. For example, using the data below, you can see that women have the largest false positive rate and men have the smallest, and that the opposite is true for false negatives.
✅ In a future lesson on Clustering, you will see how to build this 'confusion matrix' in code
| | False positive rate | False negative rate | count |
| ---------- | ------------------- | ------------------- | ----- |
| Women | 0.37 | 0.27 | 54032 |
| Men | 0.31 | 0.35 | 28620 |
| Non-binary | 0.33 | 0.31 | 1266 |
This table tells us several things. First, we note that there are comparatively few non-binary people in the data. The data is skewed, so you need to be careful how you interpret these numbers.
In this case, we have 3 groups and 2 metrics. When we are thinking about how our system affects the group of customers with their loan applicants, this may be sufficient, but when you want to define larger number of groups, you may want to distill this to smaller sets of summaries. To do that, you can add more metrics, such as the largest difference or smallest ratio of each false negative and false positive.
✅ Stop and Think: What other groups are likely to be affected for loan application?
## Mitigating unfairness
To mitigate unfairness, explore the model to generate various mitigated models and compare the tradeoffs it makes between accuracy and fairness to select the most fair model.
This introductory lesson does not dive deeply into the details of algorithmic unfairness mitigation, such as post-processing and reductions approach, but here is a tool that you may want to try.
### Fairlearn
[Fairlearn](https://fairlearn.github.io/) is an open-source Python package that allows you to assess your systems' fairness and mitigate unfairness.
The tool helps you to assesses how a model's predictions affect different groups, enabling you to compare multiple models by using fairness and performance metrics, and supplying a set of algorithms to mitigate unfairness in binary classification and regression.
- Learn how to use the different components by checking out the Fairlearn's [GitHub](https://github.com/fairlearn/fairlearn/)
- Explore the [user guide](https://fairlearn.github.io/main/user_guide/index.html), [examples](https://fairlearn.github.io/main/auto_examples/index.html)
- Try some [sample notebooks](https://github.com/fairlearn/fairlearn/tree/master/notebooks).
- Learn [how to enable fairness assessments](https://docs.microsoft.com/azure/machine-learning/how-to-machine-learning-fairness-aml?WT.mc_id=academic-77952-leestott) of machine learning models in Azure Machine Learning.
- Check out these [sample notebooks](https://github.com/Azure/MachineLearningNotebooks/tree/master/contrib/fairness) for more fairness assessment scenarios in Azure Machine Learning.
---
## 🚀 Challenge
To prevent biases from being introduced in the first place, we should:
To prevent harms from being introduced in the first place, we should:
- have a diversity of backgrounds and perspectives among the people working on systems
- invest in datasets that reflect the diversity of our society
- develop better methods for detecting and correcting bias when it occurs
- develop better methods throughout the machine learning lifecycle for detecting and correcting responible AI when it occurs
Think about real-life scenarios where unfairness is evident in model-building and usage. What else should we consider?
Think about real-life scenarios where a model's untrustworthiness is evident in model-building and usage. What else should we consider?
## [Post-lecture quiz](https://gray-sand-07a10f403.1.azurestaticapps.net/quiz/6/)
## Review & Self Study
@ -231,10 +133,11 @@ In this lesson, you have learned some basics of the concepts of fairness and unf
Watch this workshop to dive deeper into the topics:
- Fairness-related harms in AI systems: Examples, assessment, and mitigation by Hanna Wallach and Miro Dudik
- In pursuit of responsible AI: Bringing principles to practice by Besmira Nushi, Mehrnoosh Sameki and Amit Sharma
[![Responsible AI Toolbox: An open-source framework for building responsible AI](https://img.youtube.com/vi/tGgJCrA-MZU/0.jpg)](https://www.youtube.com/watch?v=tGgJCrA-MZU "RAI Toolbox: An open-source framework for building responsible AI")
[![Fairness-related harms in AI systems: Examples, assessment, and mitigation](https://img.youtube.com/vi/1RptHwfkx_k/0.jpg)](https://www.youtube.com/watch?v=1RptHwfkx_k "Fairness-related harms in AI systems: Examples, assessment, and mitigation")
> 🎥 Click the image above for a video: Fairness-related harms in AI systems: Examples, assessment, and mitigation by Hanna Wallach and Miro Dudik
> 🎥 Click the image above for a video: RAI Toolbox: An open-source framework for building responsible AI by Besmira Nushi, Mehrnoosh Sameki, and Amit Sharma
Also, read:
@ -242,9 +145,9 @@ Also, read:
- Microsofts FATE research group: [FATE: Fairness, Accountability, Transparency, and Ethics in AI - Microsoft Research](https://www.microsoft.com/research/theme/fate/)
Explore the Fairlearn toolkit:
RAI Toolbox:
- [Fairlearn](https://fairlearn.org/)
- [Responsible AI Toolbox GitHub repository](https://github.com/microsoft/responsible-ai-toolbox)
Read about Azure Machine Learning's tools to ensure fairness:
@ -252,4 +155,4 @@ Read about Azure Machine Learning's tools to ensure fairness:
## Assignment
[Explore Fairlearn](assignment.md)
[Explore RAI Toolbox](assignment.md)

@ -1,211 +0,0 @@
# Justicia en el Aprendizaje Automático
![Resumen de justicia en el aprendizaje automático en un sketchnote](../../../sketchnotes/ml-fairness.png)
> Sketchnote por [Tomomi Imura](https://www.twitter.com/girlie_mac)
## [Examen previo a la lección](https://gray-sand-07a10f403.1.azurestaticapps.net/quiz/5?loc=es)
## Introducción
En esta sección, comenzarás a descubrir como el aprendizaje automático puede y está impactando nuestra vida diaria. Incluso ahora mismo, hay sistemas y modelos involucrados en tareas diarias de toma de decisiones, como los diagnósticos del cuidado de la salud o detección del fraude. Es importante que estos modelos funcionen correctamente con el fin de proveer resultados justos para todos.
Imagina que podría pasar si los datos que usas para construir estos modelos carecen de cierta demografía, como es el caso de raza, género, punto de vista político, religión, o representan desproporcionadamente estas demografías. ¿Qué pasa cuando los resultados del modelo son interpretados en favor de alguna demografía? ¿Cuál es la consecuencia para la aplicación?
En esta lección, será capaz de:
- Tomar conciencia de la importancia de la justicia en el aprendizaje automático.
- Aprender acerca de daños relacionados a la justicia.
- Aprender acerca de la evaluación de la justicia y mitigación.
## Prerrequisitos
Como un prerrequisito, por favor toma la ruta de aprendizaje "Responsible AI Principles" y mira el vídeo debajo sobre el tema:
Aprende más acerca de la AI responsable siguiendo este [curso](https://docs.microsoft.com/es-es/learn/modules/responsible-ai-principles/?WT.mc_id=academic-77952-leestott)
[![Enfonque de Microsoft para la AI responsable](https://img.youtube.com/vi/dnC8-uUZXSc/0.jpg)](https://youtu.be/dnC8-uUZXSc "Enfoque de Microsoft para la AI responsable")
> 🎥 Haz clic en imagen superior para el vídeo: Enfoque de Microsoft para la AI responsable
## Injusticia en los datos y algoritmos
> "Si torturas los datos lo suficiente, estos confesarán cualquier cosa" - Ronald Coase
Esta oración suena extrema, pero es cierto que los datos pueden ser manipulados para soportar cualquier conclusión. Dicha manipulación puede ocurrir a veces de forma no intencional. Como humanos, todos tenemos sesgos, y muchas veces es difícil saber conscientemente cuando estás introduciendo un sesgo en los datos.
El garantizar la justicia en la AI y aprendizaje automático sigue siendo un desafío socio-tecnológico complejo. Esto quiere decir que no puede ser afrontado desde una perspectiva puramente social o técnica.
### Daños relacionados con la justicia
¿Qué quieres decir con injusticia? "injusticia" engloba impactos negativos, o "daños", para un grupo de personas, como aquellos definidos en términos de raza, género, edad o estado de discapacidad.
Los principales daños relacionados a la justicia pueden ser clasificados como de:
- **Asignación**, si un género o etnia, por ejemplo, se favorece sobre otro.
- **Calidad del servicio**. Si entrenas los datos para un escenario específico pero la realidad es mucho más compleja, esto conlleva a servicio de bajo rendimiento.
- **Estereotipo**. El asociar un cierto grupo con atributos preasignados.
- **Denigrado**. Criticar injustamente y etiquetar algo o alguien.
- **Sobre- o sub- representación** La idea es que un cierto grupo no es visto en una cierta profesión, y cualquier servicio o función que sigue promocionándolo está contribuyendo al daño.
Demos un vistazo a los ejemplos.
### Asignación
Considerar un sistema hipotético para seleccionar solicitudes de préstamo. El sistema tiende a seleccionar a hombres blancos como mejores candidatos por encima de otros grupos. Como resultado, los préstamos se retienen para ciertos solicitantes.
Otro ejemplo sería una herramienta experimental de contratación desarrollada por una gran corporación para seleccionar candidatos. La herramienta discriminó sistemáticamente un género de otro usando los modelos entrenados para preferir palabras asociadas con otras, lo cual resultó en candidatos penalizados cuyos currículos contienen palabras como "equipo de rugby femenino".
✅ Realiza una pequeña investigación para encontrar un ejemplo del mundo real de algo como esto.
### Calidad del servicio
Los investigadores encontraron que varios clasificadores de género comerciales tenían altas tasas de error en las imágenes de mujeres con tonos de piel más oscuros, al contrario que con imágenes de hombres con tonos de piel más claros. [Referencia](https://www.media.mit.edu/publications/gender-shades-intersectional-accuracy-disparities-in-commercial-gender-classification/)
Otro ejemplo infame es el dispensador de jabón para manos que parece no ser capaz de detectar a la gente con piel de color oscuro. [Referencia](https://gizmodo.com/why-cant-this-soap-dispenser-identify-dark-skin-1797931773)
### Estereotipo
La vista de género estereotipada fue encontrada en una traducción automática. Cuando se tradujo “Él es un enfermero y ella es una doctora” al turco, se encontraron los problemas. El turco es un idioma sin género el cual tiene un pronombre "o" para comunicar el singular de la tercera persona, pero al traducir nuevamente la oración del turco al inglés resulta la frase estereotipada e incorrecta de “Ella es una enfermera y él es un doctor”.
![Traducción al turco](../images/gender-bias-translate-en-tr.png)
![Traducción de nuevo al inglés](../images/gender-bias-translate-tr-en.png)
### Denigración
Una tecnología de etiquetado de imágenes horriblemente etiquetó imágenes de gente con color oscuro de piel como gorilas. El etiquetado incorrecto es dañino no solo porque el sistema cometió un error, sino porque específicamente aplicó una etiqueta que tiene una larga historia de ser usada a propósito para denigrar a la gente negra.
[![AI: ¿No soy una mujer?](https://img.youtube.com/vi/QxuyfWoVV98/0.jpg)](https://www.youtube.com/watch?v=QxuyfWoVV98 "AI, ¿No soy una mujer?")
> 🎥 Da clic en la imagen superior para el video: AI, ¿No soy una mujer? - un espectáculo que muestra el daño causado por la denigración racista de una AI.
### Sobre- o sub- representación
Los resultados de búsqueda de imágenes sesgados pueden ser un buen ejemplo de este daño. Cuando se buscan imágenes de profesiones con un porcentaje igual o mayor de hombres que de mujeres, como en ingeniería, o CEO, observa que los resultados están mayormente inclinados hacia un género dado.
![Búsqueda de CEO en Bing](../images/ceos.png)
> Esta búsqueda en Bing para 'CEO' produce resultados bastante inclusivos
Estos cinco tipos principales de daños no son mutuamente exclusivos, y un solo sistema puede exhibir más de un tipo de daño. Además, cada caso varía en severidad. Por ejemplo, etiquetar injustamente a alguien como un criminal es un daño mucho más severo que etiquetar incorrectamente una imagen. Es importante, sin embargo, el recordar que aún los daños relativamente no severos pueden hacer que la gente se sienta enajenada o señalada y el impacto acumulado puede ser extremadamente opresivo.
**Discusión**: Revisa algunos de los ejemplos y ve si estos muestran diferentes daños.
| | Asignación | Calidad del servicio | Estereotipo | Denigrado | Sobre- o sub- representación |
| ----------------------- | :--------: | :----------------: | :----------: | :---------: | :----------------------------: |
| Sistema de contratación automatizada | x | x | x | | x |
| Traducción automática | | | | | |
| Etiquetado de fotos | | | | | |
## Detectando injusticias
Hay varias razones por las que un sistema se comporta injustamente. Los sesgos sociales, por ejemplo, pueden ser reflejados en los conjutos de datos usados para entrenarlos. Por ejemplo, la injusticia en la contratación puede ser exacerbada por la sobre dependencia en los datos históricos. Al emplear patrones elaborados a partir de currículos enviados a la compañía en un período de 10 años, el modelo determinó que los hombres estaban más calificados porque la mayoría de los currículos provenían de hombres, reflejo del pasado dominio masculino en la industria tecnológica.
Los datos inadecuados acerca de cierto grupo de personas pueden ser la razón de la injusticia. Por ejemplo, los clasificadores de imágenes tienes una tasa de error más alta para imágenes de gente con piel oscura porque los tonos de piel más oscura fueron sub-representados en los datos.
Las suposiciones erróneas hechas durante el desarrollo también causan injusticia. Por ejemplo, un sistema de análisis facial intentó predecir quién cometerá un crimen basado en imágenes de los rostros de personas que pueden llevar a supuestos dañinos. Esto podría llevar a daños substanciales para las personas clasificadas erróneamente.
## Entiende tus modelos y construye de forma justa
A pesar de los muchos aspectos de justicia que no son capturados en métricas cuantitativas justas, y que no es posible borrar totalmente el sesgo de un sistema para garantizar la justicia, eres responsable de detectar y mitigar problemas de justicia tanto como sea posible.
Cuando trabajas con modelos de aprendizaje automático, es importante entender tus modelos asegurando su interpretabilidad y evaluar y mitigar injusticias.
Usemos el ejemplo de selección de préstamos para aislar el caso y averiguar el nivel de impacto de cada factor en la predicción.
## Métodos de evaluación
1. **Identifica daños (y beneficios)**. El primer paso es identificar daños y beneficios. Piensa en cómo las acciones y decisiones pueden afectar tanto a clientes potenciales como al negocio mismo.
2. **Identifica los grupos afectados**. Una vez que entendiste qué clase de daños o beneficios pueden ocurrir, identifica los grupos que podrían ser afectados. ¿Están estos grupos definidos por género, etnicidad, o grupo social?
3. **Define métricas de justicia**. Finalmente, define una métrica para así tener algo con qué medir en tu trabajo para mejorar la situación.
### Identifica daños (y beneficios)
¿Cuáles son los daños y beneficios asociados con el préstamo? Piensa en escenarios con falsos negativos y falsos positivos:
**Falsos negativos** (rechazado, pero Y=1) - en este caso, un solicitante que sería capaz de pagar un préstamo es rechazado. Esto es un evento adverso porque los recursos de los préstamos se retienen a los solicitantes calificados.
**Falsos positivos** (aceptado, pero Y=0) - en este caso, el solicitante obtiene un préstamo pero eventualmente incumple. Como resultado, el caso del solicitante será enviado a la agencia de cobro de deudas lo cual puede afectar en sus futuras solicitudes de préstamo.
### Identifica los grupos afectados
Los siguientes pasos son determinar cuales son los grupos que suelen ser afectados. Por ejemplo, en caso de una solicitud de tarjeta de crédito, un modelo puede determinar que las mujeres deberían recibir mucho menor límite de crédito comparado con sus esposos con los cuales comparten ingreso familiar. Una demografía entera, definida por género, es de este modo afectada.
### Define métricas de justicia
Has identificado los daños y un grupo afectado, en este caso, delimitado por género. Ahora, usa los factores cuantificados para desagregar sus métricas. Por ejemplo, usando los datos abajo, puedes ver que las mujeres tienen la tasa de falso positivo más grande y los hombres tienen la más pequeña, y que lo opuesto es verdadero para los falsos negativos.
✅ En una lección futura de Clustering, verás como construir esta 'matriz de confusión' en código
| | Tasa de falso positivo | Tasa de falso negativo | contador |
| ---------- | ------------------- | ------------------- | ----- |
| Mujeres | 0.37 | 0.27 | 54032 |
| Hombres | 0.31 | 0.35 | 28620 |
| No-binario | 0.33 | 0.31 | 1266 |
Esta tabla nos dice varias cosas. Primero, notamos que hay comparativamente pocas personas no-binarias en los datos. Los datos están sesgados, por lo que necesitas ser cuidadoso en cómo interpretas estos números.
En este caso, tenemos 3 grupos y 2 métricas. En el caso de cómo nuestro sistema afecta a los grupos de clientes con sus solicitantes de préstamo, esto puede ser suficiente, pero cuando quieres definir grupos mayores, querrás reducir esto a conjuntos más pequeños de resúmenes. Para hacer eso, puedes agregar más métricas, como la mayor diferencia o la menor tasa de cada falso negativo y falso positivo.
✅ Detente y piensa: ¿Qué otros grupos es probable se vean afectados a la hora de solicitar un préstamo?
## Mitigando injusticias
Para mitigar injusticias, explora el modelo para generar varios modelos mitigados y compara las compensaciones que se hacen entre la precisión y justicia para seleccionar el modelo más justo.
Esta lección introductoria no profundiza en los detalles de mitigación algorítmica de injusticia, como los enfoques de post-procesado y de reducciones, pero aquí tienes una herramiento que podrías probar:
### Fairlearn
[Fairlearn](https://fairlearn.github.io/) es un paquete Python de código abierto que te permite evaluar la justicia de tus sistemas y mitigar injusticias.
La herramienta te ayuda a evaluar cómo unos modelos de predicción afectan a diferentes grupos, permitiéndote comparar múltiples modelos usando métricas de rendimiento y justicia, y provee un conjunto de algoritmos para mitigar injusticia en regresión y clasificación binaria.
- Aprende cómo usar los distintos componentes revisando el repositorio de [GitHub](https://github.com/fairlearn/fairlearn/) de Fairlearn.
- Explora la [guía de usuario](https://fairlearn.github.io/main/user_guide/index.html), [ejemplos](https://fairlearn.github.io/main/auto_examples/index.html)
- Prueba algunos [notebooks de ejemplo](https://github.com/fairlearn/fairlearn/tree/master/notebooks).
- Aprende a [cómo activar evaluación de justicia](https://docs.microsoft.com/azure/machine-learning/how-to-machine-learning-fairness-aml?WT.mc_id=academic-77952-leestott) de los modelos de aprendizaje automático en Azure Machine Learning.
- Revisa estos [notebooks de ejemplo](https://github.com/Azure/MachineLearningNotebooks/tree/master/contrib/fairness) para más escenarios de evaluaciones de justicia en Azure Machine Learning.
---
## 🚀 Desafío
Para prevenir que los sesgos sean introducidos en primer lugar, debemos:
- Tener una diversidad de antecedentes y perspectivas entre las personas trabajando en los sistemas.
- Invertir en conjuntos de datos que reflejen la diversidad de nuestra sociedad.
- Desarrollar mejores métodos para la detección y corrección de sesgos cuando estos ocurren.
Piensa en escenarios de la vida real donde la injusticia es evidente en la construcción y uso de modelos. ¿Qué más debemos considerar?
## [Cuestionario posterior a la lección](https://gray-sand-07a10f403.1.azurestaticapps.net/quiz/6?loc=es)
## Revisión y autoestudio
En esta lección has aprendido algunos de los conceptos básicos de justicia e injusticia en el aprendizaje automático.
Mira este taller para profundizar en estos temas:
- YouTube: [Daños relacionados con la justicia en sistemas de AI: Ejemplos, evaluaciones, y mitigación - YouTube](https://www.youtube.com/watch?v=1RptHwfkx_k) por Hanna Wallach y Miro Dudik
También lee:
- Centro de recursos de Microsoft RAI: [Recursos de AI responsable Microsoft AI](https://www.microsoft.com/ai/responsible-ai-resources?activetab=pivot1%3aprimaryr4)
- Grupo de investigación de Microsoft FATE: [FATE: Fairness, Accountability, Transparency, and Ethics in AI - Microsoft Research](https://www.microsoft.com/research/theme/fate/)
Explorar la caja de herramientas de Fairlearn
[Fairlearn](https://fairlearn.org/)
Lee acerca de las herramientas de Azure Machine Learning para asegurar justicia
- [Azure Machine Learning](https://docs.microsoft.com/azure/machine-learning/concept-fairness-ml?WT.mc_id=academic-77952-leestott)
## Tarea
[Explora Fairlearn](../translations/assignment.es.md)

@ -1,212 +0,0 @@
# Equité dans le Machine Learning
![Résumé de l'équité dans le Machine Learning dans un sketchnote](../../../sketchnotes/ml-fairness.png)
> Sketchnote par [Tomomi Imura](https://www.twitter.com/girlie_mac)
## [Quiz préalable](https://gray-sand-07a10f403.1.azurestaticapps.net/quiz/5/?loc=fr)
## Introduction
Dans ce programme, nous allons découvrir comment le Machine Learning peut avoir un impact sur notre vie quotidienne. Encore aujourd'hui, les systèmes et les modèles sont impliqués quotidiennement dans les tâches de prise de décision, telles que les diagnostics de soins ou la détection de fraudes. Il est donc important que ces modèles fonctionnent bien afin de fournir des résultats équitables pour tout le monde.
Imaginons ce qui peut arriver lorsque les données que nous utilisons pour construire ces modèles manquent de certaines données démographiques, telles que la race, le sexe, les opinions politiques, la religion ou représentent de manière disproportionnée ces données démographiques. Qu'en est-il lorsque la sortie du modèle est interprétée pour favoriser certains éléments démographiques ? Quelle est la conséquence pour l'application l'utilisant ?
Dans cette leçon, nous :
- Sensibiliserons sur l'importance de l'équité dans le Machine Learning.
- En apprendrons plus sur les préjudices liés à l'équité.
- En apprendrons plus sur l'évaluation et l'atténuation des injustices.
## Prérequis
En tant que prérequis, veuillez lire le guide des connaissances sur les "Principes de l'IA responsable" et regarder la vidéo sur le sujet suivant :
En apprendre plus sur l'IA responsable en suivant ce [guide des connaissances](https://docs.microsoft.com/fr-fr/learn/modules/responsible-ai-principles/?WT.mc_id=academic-77952-leestott)
[![L'approche de Microsoft sur l'IA responsable](https://img.youtube.com/vi/dnC8-uUZXSc/0.jpg)](https://youtu.be/dnC8-uUZXSc "Microsoft's Approach to Responsible AI")
> 🎥 Cliquez sur l'image ci-dessus pour la vidéo : Microsoft's Approach to Responsible AI
## Injustices dans les données et les algorithmes
> "Si vous torturez les données assez longtemps, elles avoueront n'importe quoi" - Ronald Coase
Cette affirmation semble extrême, mais il est vrai que les données peuvent être manipulées pour étayer n'importe quelle conclusion. Une telle manipulation peut parfois se produire involontairement. En tant qu'êtres humains, nous avons tous des biais, et il est souvent difficile de savoir consciemment quand nous introduisons des biais dans les données.
Garantir l'équité dans l'IA et le Machine Learning reste un défi sociotechnique complexe. Cela signifie qu'il ne peut pas être abordé d'un point de vue purement social ou technique.
### Dommages liés à l'équité
Qu'entendons-nous par injustice ? Le terme « injustice » englobe les impacts négatifs, ou « dommages », pour un groupe de personnes, tels que ceux définis en termes de race, de sexe, d'âge ou de statut de handicap.
Les principaux préjudices liés à l'équité peuvent être classés comme suit :
- **Allocation**, si un sexe ou une ethnicité par exemple est favorisé par rapport à un autre.
- **Qualité de service**. Si vous entraînez les données pour un scénario spécifique mais que la réalité est plus complexe, cela résulte à de très mauvaises performances du service.
- **Stéréotypes**. Associer à un groupe donné des attributs pré-assignés.
- **Dénigration**. Critiquer et étiqueter injustement quelque chose ou quelqu'un.
- **Sur- ou sous- représentation**. L'idée est qu'un certain groupe n'est pas vu dans une certaine profession, et tout service ou fonction qui continue de promouvoir cette représentation contribue, in-fine, à nuire à ce groupe.
Regardons quelques exemples :
### Allocation
Envisageons un système hypothétique de filtrage des demandes de prêt : le système a tendance à choisir les hommes blancs comme de meilleurs candidats par rapport aux autres groupes. En conséquence, les prêts sont refusés à certains demandeurs.
Un autre exemple est un outil de recrutement expérimental développé par une grande entreprise pour sélectionner les candidats. L'outil discriminait systématiquement un sexe en utilisant des modèles qui ont été formés pour préférer les mots associés à d'autres. Cela a eu pour effet de pénaliser les candidats dont les CV contiennent des mots tels que « équipe féminine de rugby ».
✅ Faites une petite recherche pour trouver un exemple réel de ce type d'injustice.
### Qualité de Service
Les chercheurs ont découvert que plusieurs classificateurs commerciaux de sexe avaient des taux d'erreur plus élevés autour des images de femmes avec des teins de peau plus foncés par opposition aux images d'hommes avec des teins de peau plus clairs. [Référence](https://www.media.mit.edu/publications/gender-shades-intersectional-accuracy-disparities-in-commercial-gender-classification/)
Un autre exemple tristement célèbre est un distributeur de savon pour les mains qui ne semble pas capable de détecter les personnes ayant une couleur de peau foncée. [Référence](https://www.journaldugeek.com/2017/08/18/quand-un-distributeur-automatique-de-savon-ne-reconnait-pas-les-couleurs-de-peau-foncees/)
### Stéréotypes
Une vision stéréotypée du sexe a été trouvée dans la traduction automatique. Lors de la traduction de « il est infirmier et elle est médecin » en turc, des problèmes ont été rencontrés. Le turc est une langue sans genre et possède un pronom « o » pour transmettre une troisième personne du singulier. Cependant, la traduction de la phrase du turc à l'anglais donne la phrase incorrecte et stéréotypée suivante : « elle est infirmière et il est médecin ».
![Traduction en turc](../images/gender-bias-translate-en-tr.png)
![Traduction en anglais de nouveau](../images/gender-bias-translate-tr-en.png)
### Dénigration
Une technologie d'étiquetage d'images a notoirement mal étiqueté les images de personnes à la peau foncée comme des gorilles. L'étiquetage erroné est nocif, non seulement parce que le système fait des erreurs mais surtout car il a spécifiquement appliqué une étiquette qui a pour longtemps été délibérément détournée pour dénigrer les personnes de couleurs.
[![IA : Ne suis-je pas une femme ?](https://img.youtube.com/vi/QxuyfWoVV98/0.jpg)](https://www.youtube.com/watch?v=QxuyfWoVV98 "AI, Ain't I a Woman?")
> 🎥 Cliquez sur l'image ci-dessus pour la vidéo : AI, Ain't I a Woman - une performance montrant le préjudice causé par le dénigrement raciste par l'IA
### Sur- ou sous- représentation
Les résultats de recherche d'images biaisés peuvent être un bon exemple de ce préjudice. Lorsque nous recherchons des images de professions avec un pourcentage égal ou supérieur d'hommes que de femmes, comme l'ingénierie ou PDG, nous remarquons des résultats qui sont plus fortement biaisés en faveur d'un sexe donné.
![Recherche Bing pour PDG](../images/ceos.png)
> Cette recherche sur Bing pour « PDG » produit des résultats assez inclusifs
Ces cinq principaux types de préjudices ne sont pas mutuellement exclusifs et un même système peut présenter plus d'un type de préjudice. De plus, chaque cas varie dans sa gravité. Par exemple, étiqueter injustement quelqu'un comme un criminel est un mal beaucoup plus grave que de mal étiqueter une image. Il est toutefois important de se rappeler que même des préjudices relativement peu graves peuvent causer une aliénation ou une isolation de personnes et l'impact cumulatif peut être extrêmement oppressant.
**Discussion**: Revoyez certains des exemples et voyez s'ils montrent des préjudices différents.
| | Allocation | Qualité de service | Stéréotypes | Dénigration | Sur- or sous- représentation |
| ----------------------- | :--------: | :----------------: | :----------: | :---------: | :----------------------------: |
| Système de recrutement automatisé | x | x | x | | x |
| Traduction automatique | | | | | |
| Étiquetage des photos | | | | | |
## Détecter l'injustice
Il existe de nombreuses raisons pour lesquelles un système donné se comporte de manière injuste. Les préjugés sociaux, par exemple, pourraient se refléter dans les ensembles de données utilisés pour les former. Par exemple, l'injustice à l'embauche pourrait avoir été exacerbée par une confiance excessive dans les données historiques. Ainsi, en utilisant les curriculum vitae soumis à l'entreprise sur une période de 10 ans, le modèle a déterminé que les hommes étaient plus qualifiés car la majorité des CV provenaient d'hommes, reflet de la domination masculine passée dans l'industrie de la technologie.
Des données inadéquates sur un certain groupe de personnes peuvent être la cause d'une injustice. Par exemple, les classificateurs d'images avaient un taux d'erreur plus élevé pour les images de personnes à la peau foncée, car les teins de peau plus foncés étaient sous-représentés dans les données.
Des hypothèses erronées faites pendant le développement causent également des injustices. Par exemple, un système d'analyse faciale destiné à prédire qui va commettre un crime sur la base d'images de visages peut conduire à des hypothèses préjudiciables. Cela pourrait entraîner des dommages substantiels pour les personnes mal classées.
## Comprendre vos modèles et instaurer l'équité
Bien que de nombreux aspects de l'équité ne soient pas pris en compte dans les mesures d'équité quantitatives et qu'il ne soit pas possible de supprimer complètement les biais d'un système pour garantir l'équité, nous sommes toujours responsable de détecter et d'atténuer autant que possible les problèmes d'équité.
Lorsque nous travaillons avec des modèles de Machine Learning, il est important de comprendre vos modèles en garantissant leur interprétabilité et en évaluant et en atténuant les injustices.
Utilisons l'exemple de sélection de prêt afin de déterminer le niveau d'impact de chaque facteur sur la prédiction.
## Méthodes d'évaluation
1. **Identifier les préjudices (et les avantages)**. La première étape consiste à identifier les préjudices et les avantages. Réfléchissez à la façon dont les actions et les décisions peuvent affecter à la fois les clients potentiels et l'entreprise elle-même.
1. **Identifier les groupes concernés**. Une fois que vous avez compris le type de préjudices ou d'avantages qui peuvent survenir, identifiez les groupes susceptibles d'être touchés. Ces groupes sont-ils définis par le sexe, l'origine ethnique ou le groupe social ?
1. **Définir des mesures d'équité**. Enfin, définissez une métrique afin d'avoir quelque chose à comparer dans votre travail pour améliorer la situation.
### Identifier les préjudices (et les avantages)
Quels sont les inconvénients et les avantages associés au prêt ? Pensez aux faux négatifs et aux faux positifs :
**Faux négatifs** (rejeter, mais Y=1) - dans ce cas, un demandeur qui sera capable de rembourser un prêt est rejeté. Il s'agit d'un événement défavorable parce que les prêts sont refusées aux candidats qualifiés.
**Faux positifs** (accepter, mais Y=0) - dans ce cas, le demandeur obtient un prêt mais finit par faire défaut. En conséquence, le dossier du demandeur sera envoyé à une agence de recouvrement de créances, ce qui peut affecter ses futures demandes de prêt.
### Identifier les groupes touchés
L'étape suivante consiste à déterminer quels groupes sont susceptibles d'être touchés. Par exemple, dans le cas d'une demande de carte de crédit, un modèle pourrait déterminer que les femmes devraient recevoir des limites de crédit beaucoup plus basses par rapport à leurs conjoints qui partagent les biens du ménage. Tout un groupe démographique, défini par le sexe, est ainsi touché.
### Définir les mesures d'équité
Nous avons identifié les préjudices et un groupe affecté, dans ce cas, défini par leur sexe. Maintenant, nous pouvons utiliser les facteurs quantifiés pour désagréger leurs métriques. Par exemple, en utilisant les données ci-dessous, nous pouvons voir que les femmes ont le taux de faux positifs le plus élevé et les hommes ont le plus petit, et que l'inverse est vrai pour les faux négatifs.
✅ Dans une prochaine leçon sur le clustering, nous verrons comment construire cette 'matrice de confusion' avec du code
| | Taux de faux positifs | Taux de faux négatifs | Nombre |
| ---------- | ------------------- | ------------------- | ----- |
| Femmes | 0.37 | 0.27 | 54032 |
| Hommes | 0.31 | 0.35 | 28620 |
| Non binaire | 0.33 | 0.31 | 1266 |
Ce tableau nous dit plusieurs choses. Premièrement, nous notons qu'il y a relativement peu de personnes non binaires dans les données. Les données sont faussées, nous devons donc faire attention à la façon dont nous allons interpréter ces chiffres.
Dans ce cas, nous avons 3 groupes et 2 mesures. Lorsque nous pensons à la manière dont notre système affecte le groupe de clients avec leurs demandeurs de prêt, cela peut être suffisant. Cependant si nous souhaitions définir un plus grand nombre de groupes, nous allons sûrement devoir le répartir en de plus petits ensembles de mesures. Pour ce faire, vous pouvez ajouter plus de métriques, telles que la plus grande différence ou le plus petit rapport de chaque faux négatif et faux positif.
✅ Arrêtez-vous et réfléchissez : Quels autres groupes sont susceptibles d'être affectés par la demande de prêt ?
## Atténuer l'injustice
Pour atténuer l'injustice, il faut explorer le modèle pour générer divers modèles atténués et comparer les compromis qu'il fait entre précision et équité afin de sélectionner le modèle le plus équitable.
Cette leçon d'introduction ne plonge pas profondément dans les détails de l'atténuation des injustices algorithmiques, telles que l'approche du post-traitement et des réductions, mais voici un outil que vous voudrez peut-être essayer.
### Fairlearn
[Fairlearn](https://fairlearn.github.io/) est un package Python open source qui permet d'évaluer l'équité des systèmes et d'atténuer les injustices.
L'outil aide à évaluer comment les prédictions d'un modèle affectent différents groupes, en permettant de comparer plusieurs modèles en utilisant des mesures d'équité et de performance, et en fournissant un ensemble d'algorithmes pour atténuer les injustices dans la classification binaire et la régression.
- Apprenez à utiliser les différents composants en consultant la documentation Fairlearn sur [GitHub](https://github.com/fairlearn/fairlearn/)
- Explorer le [guide utilisateur](https://fairlearn.github.io/main/user_guide/index.html), et les [exemples](https://fairlearn.github.io/main/auto_examples/index.html)
- Essayez quelques [notebooks d'exemples](https://github.com/fairlearn/fairlearn/tree/master/notebooks).
- Apprenez [comment activer les évaluations d'équités](https://docs.microsoft.com/fr-fr/azure/machine-learning/how-to-machine-learning-fairness-aml?WT.mc_id=academic-77952-leestott) des modèles de machine learning sur Azure Machine Learning.
- Jetez un coup d'oeil aux [notebooks d'exemples](https://github.com/Azure/MachineLearningNotebooks/tree/master/contrib/fairness) pour plus de scénarios d'évaluation d'équités sur Azure Machine Learning.
---
## 🚀 Challenge
Pour éviter que des biais ne soient introduits en premier lieu, nous devrions :
- Avoir une diversité d'expériences et de perspectives parmi les personnes travaillant sur les systèmes
- Investir dans des ensembles de données qui reflètent la diversité de notre société
- Développer de meilleures méthodes pour détecter et corriger les biais lorsqu'ils surviennent
Pensez à des scénarios de la vie réelle où l'injustice est évidente dans la construction et l'utilisation de modèles. Que devrions-nous considérer d'autre ?
## [Quiz de validation des connaissances](https://gray-sand-07a10f403.1.azurestaticapps.net/quiz/6/?loc=fr)
## Révision et auto-apprentissage
Dans cette leçon, nous avons appris quelques notions de base sur les concepts d'équité et d'injustice dans le machine learning.
Regardez cet atelier pour approfondir les sujets :
- YouTube : Dommages liés à l'équité dans les systèmes d'IA : exemples, évaluation et atténuation par Hanna Wallach et Miro Dudik [Fairness-related harms in AI systems: Examples, assessment, and mitigation - YouTube](https://www.youtube.com/watch?v=1RptHwfkx_k)
Lectures supplémentaires :
- Centre de ressources Microsoft RAI : [Responsible AI Resources Microsoft AI](https://www.microsoft.com/fr-fr/ai/responsible-ai-resources?activetab=pivot1:primaryr4&rtc=1)
- Groupe de recherche Microsoft FATE : [FATE: Fairness, Accountability, Transparency, and Ethics in AI - Microsoft Research](https://www.microsoft.com/research/theme/fate/)
Explorer la boite à outils Fairlearn
[Fairlearn](https://fairlearn.org/)
Lire sur les outils Azure Machine Learning afin d'assurer l'équité
- [Azure Machine Learning](https://docs.microsoft.com/fr-fr/azure/machine-learning/concept-fairness-ml?WT.mc_id=academic-77952-leestott)
## Devoir
[Explorer Fairlearn](assignment.fr.md)

@ -1,213 +0,0 @@
# Keadilan dalam Machine Learning
![Ringkasan dari Keadilan dalam Machine Learning dalam sebuah catatan sketsa](../../../sketchnotes/ml-fairness.png)
> Catatan sketsa oleh [Tomomi Imura](https://www.twitter.com/girlie_mac)
## [Quiz Pra-Pelajaran](https://gray-sand-07a10f403.1.azurestaticapps.net/quiz/5/)
## Pengantar
Dalam kurikulum ini, kamu akan mulai mengetahui bagaimana Machine Learning bisa memengaruhi kehidupan kita sehari-hari. Bahkan sekarang, sistem dan model terlibat dalam tugas pengambilan keputusan sehari-hari, seperti diagnosis kesehatan atau mendeteksi penipuan. Jadi, penting bahwa model-model ini bekerja dengan baik untuk memberikan hasil yang adil bagi semua orang.
Bayangkan apa yang bisa terjadi ketika data yang kamu gunakan untuk membangun model ini tidak memiliki demografi tertentu, seperti ras, jenis kelamin, pandangan politik, agama, atau secara tidak proporsional mewakili demografi tersebut. Bagaimana jika keluaran dari model diinterpretasikan lebih menyukai beberapa demografis tertentu? Apa konsekuensi untuk aplikasinya?
Dalam pelajaran ini, kamu akan:
- Meningkatkan kesadaran dari pentingnya keadilan dalam Machine Learning.
- Mempelajari tentang berbagai kerugian terkait keadilan.
- Learn about unfairness assessment and mitigation.
- Mempelajari tentang mitigasi dan penilaian ketidakadilan.
## Prasyarat
Sebagai prasyarat, silakan ikuti jalur belajar "Prinsip AI yang Bertanggung Jawab" dan tonton video di bawah ini dengan topik:
Pelajari lebih lanjut tentang AI yang Bertanggung Jawab dengan mengikuti [Jalur Belajar](https://docs.microsoft.com/learn/modules/responsible-ai-principles/?WT.mc_id=academic-77952-leestott) ini
[![Pendekatan Microsoft untuk AI yang Bertanggung Jawab](https://img.youtube.com/vi/dnC8-uUZXSc/0.jpg)](https://youtu.be/dnC8-uUZXSc "Pendekatan Microsoft untuk AI yang Bertanggung Jawab")
> 🎥 Klik gambar diatas untuk menonton video: Pendekatan Microsoft untuk AI yang Bertanggung Jawab
## Ketidakadilan dalam data dan algoritma
> "Jika Anda menyiksa data cukup lama, data itu akan mengakui apa pun " - Ronald Coase
Pernyataan ini terdengar ekstrem, tetapi memang benar bahwa data dapat dimanipulasi untuk mendukung kesimpulan apa pun. Manipulasi semacam itu terkadang bisa terjadi secara tidak sengaja. Sebagai manusia, kita semua memiliki bias, dan seringkali sulit untuk secara sadar mengetahui kapan kamu memperkenalkan bias dalam data.
Menjamin keadilan dalam AI dan machine learning tetap menjadi tantangan sosioteknik yang kompleks. Artinya, hal itu tidak bisa ditangani baik dari perspektif sosial atau teknis semata.
### Kerugian Terkait Keadilan
Apa yang dimaksud dengan ketidakadilan? "Ketidakadilan" mencakup dampak negatif atau "bahaya" bagi sekelompok orang, seperti yang didefinisikan dalam hal ras, jenis kelamin, usia, atau status disabilitas.
Kerugian utama yang terkait dengan keadilan dapat diklasifikasikan sebagai:
- **Alokasi**, jika suatu jenis kelamin atau etnisitas misalkan lebih disukai daripada yang lain.
- **Kualitas layanan**. Jika kamu melatih data untuk satu skenario tertentu tetapi kenyataannya jauh lebih kompleks, hasilnya adalah layanan yang berkinerja buruk.
- **Stereotip**. Mengaitkan grup tertentu dengan atribut yang ditentukan sebelumnya.
- **Fitnah**. Untuk mengkritik dan melabeli sesuatu atau seseorang secara tidak adil.
- **Representasi yang kurang atau berlebihan**. Idenya adalah bahwa kelompok tertentu tidak terlihat dalam profesi tertentu, dan layanan atau fungsi apa pun yang terus dipromosikan yang menambah kerugian.
Mari kita lihat contoh-contohnya.
### Alokasi
Bayangkan sebuah sistem untuk menyaring pengajuan pinjaman. Sistem cenderung memilih pria kulit putih sebagai kandidat yang lebih baik daripada kelompok lain. Akibatnya, pinjaman ditahan dari pemohon tertentu.
Contoh lain adalah alat perekrutan eksperimental yang dikembangkan oleh perusahaan besar untuk menyaring kandidat. Alat tersebut secara sistematis mendiskriminasi satu gender dengan menggunakan model yang dilatih untuk lebih memilih kata-kata yang terkait dengan gender lain. Hal ini mengakibatkan kandidat yang resumenya berisi kata-kata seperti "tim rugby wanita" tidak masuk kualifikasi.
✅ Lakukan sedikit riset untuk menemukan contoh dunia nyata dari sesuatu seperti ini
### Kualitas Layanan
Para peneliti menemukan bahwa beberapa pengklasifikasi gender komersial memiliki tingkat kesalahan yang lebih tinggi di sekitar gambar wanita dengan warna kulit lebih gelap dibandingkan dengan gambar pria dengan warna kulit lebih terang. [Referensi](https://www.media.mit.edu/publications/gender-shades-intersectional-accuracy-disparities-in-commercial-gender-classification/)
Contoh terkenal lainnya adalah dispenser sabun tangan yang sepertinya tidak bisa mendeteksi orang dengan kulit gelap. [Referensi](https://gizmodo.com/why-cant-this-soap-dispenser-identify-dark-skin-1797931773)
### Stereotip
Pandangan gender stereotip ditemukan dalam terjemahan mesin. Ketika menerjemahkan "dia (laki-laki) adalah seorang perawat dan dia (perempuan) adalah seorang dokter" ke dalam bahasa Turki, masalah muncul. Turki adalah bahasa tanpa gender yang memiliki satu kata ganti, "o" untuk menyampaikan orang ketiga tunggal, tetapi menerjemahkan kalimat kembali dari Turki ke Inggris menghasilkan stereotip dan salah sebagai "dia (perempuan) adalah seorang perawat dan dia (laki-laki) adalah seorang dokter".
![terjemahan ke bahasa Turki](../images/gender-bias-translate-en-tr.png)
![terjemahan kembali ke bahasa Inggris](../images/gender-bias-translate-tr-en.png)
### Fitnah
Sebuah teknologi pelabelan gambar yang terkenal salah memberi label gambar orang berkulit gelap sebagai gorila. Pelabelan yang salah berbahaya bukan hanya karena sistem membuat kesalahan karena secara khusus menerapkan label yang memiliki sejarah panjang yang sengaja digunakan untuk merendahkan orang kulit hitam.
[![AI: Bukankah Aku Seorang Wanita?](https://img.youtube.com/vi/QxuyfWoVV98/0.jpg)](https://www.youtube.com/watch?v=QxuyfWoVV98 "Bukankah Aku Seorang Wanita?")
> 🎥 Klik gambar diatas untuk sebuah video: AI, Bukankah Aku Seorang Wanita? - menunjukkan kerugian yang disebabkan oleh pencemaran nama baik yang menyinggung ras oleh AI
### Representasi yang kurang atau berlebihan
Hasil pencarian gambar yang condong ke hal tertentu (skewed) dapat menjadi contoh yang bagus dari bahaya ini. Saat menelusuri gambar profesi dengan persentase pria yang sama atau lebih tinggi daripada wanita, seperti teknik, atau CEO, perhatikan hasil yang lebih condong ke jenis kelamin tertentu.
![Pencarian CEO di Bing](../images/ceos.png)
> Pencarian di Bing untuk 'CEO' ini menghasilkan hasil yang cukup inklusif
Lima jenis bahaya utama ini tidak saling eksklusif, dan satu sistem dapat menunjukkan lebih dari satu jenis bahaya. Selain itu, setiap kasus bervariasi dalam tingkat keparahannya. Misalnya, memberi label yang tidak adil kepada seseorang sebagai penjahat adalah bahaya yang jauh lebih parah daripada memberi label yang salah pada gambar. Namun, penting untuk diingat bahwa bahkan kerugian yang relatif tidak parah dapat membuat orang merasa terasing atau diasingkan dan dampak kumulatifnya bisa sangat menekan.
**Diskusi**: Tinjau kembali beberapa contoh dan lihat apakah mereka menunjukkan bahaya yang berbeda.
| | Alokasi | Kualitas Layanan | Stereotip | Fitnah | Representasi yang kurang atau berlebihan |
| -------------------------- | :-----: | :--------------: | :-------: | :----: | :--------------------------------------: |
| Sistem perekrutan otomatis | x | x | x | | x |
| Terjemahan mesin | | | | | |
| Melabeli foto | | | | | |
## Mendeteksi Ketidakadilan
Ada banyak alasan mengapa sistem tertentu berperilaku tidak adil. Bias sosial, misalnya, mungkin tercermin dalam kumpulan data yang digunakan untuk melatih. Misalnya, ketidakadilan perekrutan mungkin telah diperburuk oleh ketergantungan yang berlebihan pada data historis. Dengan menggunakan pola dalam resume yang dikirimkan ke perusahaan selama periode 10 tahun, model tersebut menentukan bahwa pria lebih berkualitas karena mayoritas resume berasal dari pria, yang mencerminkan dominasi pria di masa lalu di industri teknologi.
Data yang tidak memadai tentang sekelompok orang tertentu dapat menjadi alasan ketidakadilan. Misalnya, pengklasifikasi gambar memiliki tingkat kesalahan yang lebih tinggi untuk gambar orang berkulit gelap karena warna kulit yang lebih gelap kurang terwakili dalam data.
Asumsi yang salah yang dibuat selama pengembangan menyebabkan ketidakadilan juga. Misalnya, sistem analisis wajah yang dimaksudkan untuk memprediksi siapa yang akan melakukan kejahatan berdasarkan gambar wajah orang dapat menyebabkan asumsi yang merusak. Hal ini dapat menyebabkan kerugian besar bagi orang-orang yang salah diklasifikasikan.
## Pahami model kamu dan bangun dalam keadilan
Meskipun banyak aspek keadilan tidak tercakup dalam metrik keadilan kuantitatif, dan tidak mungkin menghilangkan bias sepenuhnya dari sistem untuk menjamin keadilan, Kamu tetap bertanggung jawab untuk mendeteksi dan mengurangi masalah keadilan sebanyak mungkin.
Saat Kamu bekerja dengan model pembelajaran mesin, penting untuk memahami model Kamu dengan cara memastikan interpretasinya dan dengan menilai serta mengurangi ketidakadilan.
Mari kita gunakan contoh pemilihan pinjaman untuk mengisolasi kasus untuk mengetahui tingkat dampak setiap faktor pada prediksi.
## Metode Penilaian
1. **Identifikasi bahaya (dan manfaat)**. Langkah pertama adalah mengidentifikasi bahaya dan manfaat. Pikirkan tentang bagaimana tindakan dan keputusan dapat memengaruhi calon pelanggan dan bisnis itu sendiri.
1. **Identifikasi kelompok yang terkena dampak**. Setelah Kamu memahami jenis kerugian atau manfaat apa yang dapat terjadi, identifikasi kelompok-kelompok yang mungkin terpengaruh. Apakah kelompok-kelompok ini ditentukan oleh jenis kelamin, etnis, atau kelompok sosial?
1. **Tentukan metrik keadilan**. Terakhir, tentukan metrik sehingga Kamu memiliki sesuatu untuk diukur dalam pekerjaan Kamu untuk memperbaiki situasi.
### Identifikasi bahaya (dan manfaat)
Apa bahaya dan manfaat yang terkait dengan pinjaman? Pikirkan tentang skenario negatif palsu dan positif palsu:
**False negatives** (ditolak, tapi Y=1) - dalam hal ini, pemohon yang akan mampu membayar kembali pinjaman ditolak. Ini adalah peristiwa yang merugikan karena sumber pinjaman ditahan dari pemohon yang memenuhi syarat.
**False positives** (diterima, tapi Y=0) - dalam hal ini, pemohon memang mendapatkan pinjaman tetapi akhirnya wanprestasi. Akibatnya, kasus pemohon akan dikirim ke agen penagihan utang yang dapat mempengaruhi permohonan pinjaman mereka di masa depan.
### Identifikasi kelompok yang terkena dampak
Langkah selanjutnya adalah menentukan kelompok mana yang kemungkinan akan terpengaruh. Misalnya, dalam kasus permohonan kartu kredit, sebuah model mungkin menentukan bahwa perempuan harus menerima batas kredit yang jauh lebih rendah dibandingkan dengan pasangan mereka yang berbagi aset rumah tangga. Dengan demikian, seluruh demografi yang ditentukan berdasarkan jenis kelamin menjadi terpengaruh.
### Tentukan metrik keadilan
Kamu telah mengidentifikasi bahaya dan kelompok yang terpengaruh, dalam hal ini digambarkan berdasarkan jenis kelamin. Sekarang, gunakan faktor terukur (*quantified factors*) untuk memisahkan metriknya. Misalnya, dengan menggunakan data di bawah ini, Kamu dapat melihat bahwa wanita memiliki tingkat *false positive* terbesar dan pria memiliki yang terkecil, dan kebalikannya berlaku untuk *false negative*.
✅ Dalam pelajaran selanjutnya tentang *Clustering*, Kamu akan melihat bagaimana membangun 'confusion matrix' ini dalam kode
| | False positive rate | False negative rate | count |
| ---------- | ------------------- | ------------------- | ----- |
| Women | 0.37 | 0.27 | 54032 |
| Men | 0.31 | 0.35 | 28620 |
| Non-binary | 0.33 | 0.31 | 1266 |
Tabel ini memberitahu kita beberapa hal. Pertama, kami mencatat bahwa ada sedikit orang non-biner dalam data. Datanya condong (*skewed*), jadi Kamu harus berhati-hati dalam menafsirkan angka-angka ini.
Dalam hal ini, kita memiliki 3 grup dan 2 metrik. Ketika kita memikirkan tentang bagaimana sistem kita memengaruhi kelompok pelanggan dengan permohonan pinjaman mereka, ini mungkin cukup, tetapi ketika Kamu ingin menentukan jumlah grup yang lebih besar, Kamu mungkin ingin menyaringnya menjadi kumpulan ringkasan yang lebih kecil. Untuk melakukannya, Kamu dapat menambahkan lebih banyak metrik, seperti perbedaan terbesar atau rasio terkecil dari setiap *false negative* dan *false positive*.
✅ Berhenti dan Pikirkan: Kelompok lain yang apa lagi yang mungkin terpengaruh untuk pengajuan pinjaman?
## Mengurangi ketidakadilan
Untuk mengurangi ketidakadilan, jelajahi model untuk menghasilkan berbagai model yang dimitigasi dan bandingkan pengorbanan yang dibuat antara akurasi dan keadilan untuk memilih model yang paling adil.
Pelajaran pengantar ini tidak membahas secara mendalam mengenai detail mitigasi ketidakadilan algoritmik, seperti pendekatan pasca-pemrosesan dan pengurangan (*post-processing and reductions approach*), tetapi berikut adalah *tool* yang mungkin ingin Kamu coba.
### Fairlearn
[Fairlearn](https://fairlearn.github.io/) adalah sebuah *package* Python open-source yang memungkinkan Kamu untuk menilai keadilan sistem Kamu dan mengurangi ketidakadilan.
*Tool* ini membantu Kamu menilai bagaimana prediksi model memengaruhi kelompok yang berbeda, memungkinkan Kamu untuk membandingkan beberapa model dengan menggunakan metrik keadilan dan kinerja, dan menyediakan serangkaian algoritma untuk mengurangi ketidakadilan dalam klasifikasi dan regresi biner.
- Pelajari bagaimana cara menggunakan komponen-komponen yang berbeda dengan mengunjungi [GitHub](https://github.com/fairlearn/fairlearn/) Fairlearn
- Jelajahi [panduan pengguna](https://fairlearn.github.io/main/user_guide/index.html), [contoh-contoh](https://fairlearn.github.io/main/auto_examples/index.html)
- Coba beberapa [sampel notebook](https://github.com/fairlearn/fairlearn/tree/master/notebooks).
- Pelajari [bagaimana cara mengaktifkan penilaian keadilan](https://docs.microsoft.com/azure/machine-learning/how-to-machine-learning-fairness-aml?WT.mc_id=academic-77952-leestott) dari model machine learning di Azure Machine Learning.
- Lihat [sampel notebook](https://github.com/Azure/MachineLearningNotebooks/tree/master/contrib/fairness) ini untuk skenario penilaian keadilan yang lebih banyak di Azure Machine Learning.
---
## 🚀 Tantangan
Untuk mencegah kemunculan bias pada awalnya, kita harus:
- memiliki keragaman latar belakang dan perspektif di antara orang-orang yang bekerja pada sistem
- berinvestasi dalam dataset yang mencerminkan keragaman masyarakat kita
- mengembangkan metode yang lebih baik untuk mendeteksi dan mengoreksi bias ketika itu terjadi
Pikirkan tentang skenario kehidupan nyata di mana ketidakadilan terbukti dalam pembuatan dan penggunaan model. Apa lagi yang harus kita pertimbangkan?
## [Quiz Pasca-Pelajaran](https://gray-sand-07a10f403.1.azurestaticapps.net/quiz/6/)
## Ulasan & Belajar Mandiri
Dalam pelajaran ini, Kamu telah mempelajari beberapa dasar konsep keadilan dan ketidakadilan dalam pembelajaran mesin.
Tonton workshop ini untuk menyelami lebih dalam kedalam topik:
- YouTube: Kerugian terkait keadilan dalam sistem AI: Contoh, penilaian, dan mitigasi oleh Hanna Wallach dan Miro Dudik [Kerugian terkait keadilan dalam sistem AI: Contoh, penilaian, dan mitigasi - YouTube](https://www.youtube.com/watch?v=1RptHwfkx_k)
Kamu juga dapat membaca:
- Pusat sumber daya RAI Microsoft: [Responsible AI Resources Microsoft AI](https://www.microsoft.com/ai/responsible-ai-resources?activetab=pivot1%3aprimaryr4)
- Grup riset FATE Microsoft: [FATE: Fairness, Accountability, Transparency, and Ethics in AI - Microsoft Research](https://www.microsoft.com/research/theme/fate/)
Jelajahi *toolkit* Fairlearn
[Fairlearn](https://fairlearn.org/)
Baca mengenai *tools* Azure Machine Learning untuk memastikan keadilan
- [Azure Machine Learning](https://docs.microsoft.com/azure/machine-learning/concept-fairness-ml?WT.mc_id=academic-77952-leestott)
## Tugas
[Jelajahi Fairlearn](assignment.id.md)

@ -1,212 +0,0 @@
# Equità e machine learning
![Riepilogo dell'equità in machine learning in uno sketchnote](../../../sketchnotes/ml-fairness.png)
> Sketchnote di [Tomomi Imura](https://www.twitter.com/girlie_mac)
## [Quiz pre-lezione](https://gray-sand-07a10f403.1.azurestaticapps.net/quiz/5/?loc=it)
## Introduzione
In questo programma di studi, si inizierà a scoprire come machine learning può e sta influenzando la vita quotidiana. Anche ora, sistemi e modelli sono coinvolti nelle attività decisionali quotidiane, come le diagnosi sanitarie o l'individuazione di frodi. Quindi è importante che questi modelli funzionino bene per fornire risultati equi per tutti.
Si immagini cosa può accadere quando i dati che si stanno utilizzando per costruire questi modelli mancano di determinati dati demografici, come razza, genere, visione politica, religione, o rappresentano tali dati demografici in modo sproporzionato. E quando il risultato del modello viene interpretato per favorire alcuni gruppi demografici? Qual è la conseguenza per l'applicazione?
In questa lezione, si dovrà:
- Aumentare la propria consapevolezza sull'importanza dell'equità nel machine learning.
- Informarsi sui danni legati all'equità.
- Apprendere ulteriori informazioni sulla valutazione e la mitigazione dell'ingiustizia.
## Prerequisito
Come prerequisito, si segua il percorso di apprendimento "Principi di AI Responsabile" e si guardi il video qui sotto sull'argomento:
Si scopra di più sull'AI Responsabile seguendo questo [percorso di apprendimento](https://docs.microsoft.com/learn/modules/responsible-ai-principles/?WT.mc_id=academic-77952-leestott)
[![L'approccio di Microsoft all'AI responsabileL'](https://img.youtube.com/vi/dnC8-uUZXSc/0.jpg)](https://youtu.be/dnC8-uUZXSc "approccio di Microsoft all'AI Responsabile")
> 🎥 Fare clic sull'immagine sopra per un video: L'approccio di Microsoft all'AI Responsabile
## Iniquità nei dati e negli algoritmi
> "Se si torturano i dati abbastanza a lungo, essi confesseranno qualsiasi cosa" - Ronald Coase
Questa affermazione suona estrema, ma è vero che i dati possono essere manipolati per supportare qualsiasi conclusione. Tale manipolazione a volte può avvenire involontariamente. Come esseri umani, abbiamo tutti dei pregiudizi, ed è spesso difficile sapere consapevolmente quando si introduce un pregiudizio nei dati.
Garantire l'equità nell'intelligenza artificiale e machine learning rimane una sfida socio-tecnica complessa. Ciò significa che non può essere affrontata da prospettive puramente sociali o tecniche.
### Danni legati all'equità
Cosa si intende per ingiustizia? L'"ingiustizia" comprende gli impatti negativi, o "danni", per un gruppo di persone, come quelli definiti in termini di razza, genere, età o stato di disabilità.
I principali danni legati all'equità possono essere classificati come:
- **Allocazione**, se un genere o un'etnia, ad esempio, sono preferiti a un altro.
- **Qualità di servizio** Se si addestrano i dati per uno scenario specifico, ma la realtà è molto più complessa, si ottiene un servizio scadente.
- **Stereotipi**. Associazione di un dato gruppo con attributi preassegnati.
- **Denigrazione**. Criticare ed etichettare ingiustamente qualcosa o qualcuno.
- **Sovra o sotto rappresentazione**. L'idea è che un certo gruppo non è visto in una certa professione, e qualsiasi servizio o funzione che continua a promuovere ciò, contribuisce al danno.
Si dia un'occhiata agli esempi.
### Allocazione
Si consideri un ipotetico sistema per la scrematura delle domande di prestito. Il sistema tende a scegliere gli uomini bianchi come candidati migliori rispetto ad altri gruppi. Di conseguenza, i prestiti vengono negati ad alcuni richiedenti.
Un altro esempio potrebbe essere uno strumento sperimentale di assunzione sviluppato da una grande azienda per selezionare i candidati. Lo strumento discrimina sistematicamente un genere utilizzando i modelli che sono stati addestrati a preferire parole associate con altro. Ha portato a penalizzare i candidati i cui curricula contengono parole come "squadra di rugby femminile".
✅ Si compia una piccola ricerca per trovare un esempio reale di qualcosa del genere
### Qualità di Servizio
I ricercatori hanno scoperto che diversi classificatori di genere commerciali avevano tassi di errore più elevati intorno alle immagini di donne con tonalità della pelle più scura rispetto alle immagini di uomini con tonalità della pelle più chiare. [Riferimento](https://www.media.mit.edu/publications/gender-shades-intersectional-accuracy-disparities-in-commercial-gender-classification/)
Un altro esempio infamante è un distributore di sapone per le mani che sembrava non essere in grado di percepire le persone con la pelle scura. [Riferimento](https://gizmodo.com/why-cant-this-soap-dispenser-identify-dark-skin-1797931773)
### Stereotipi
La visione di genere stereotipata è stata trovata nella traduzione automatica. Durante la traduzione in turco "he is a nurse and she is a doctor" (lui è un'infermiere e lei un medico), sono stati riscontrati problemi. Il turco è una lingua senza genere che ha un pronome, "o" per trasmettere una terza persona singolare, ma tradurre la frase dal turco all'inglese produce lo stereotipo e scorretto come "she is a nurse and he is a doctor" (lei è un'infermiera e lui è un medico).
![traduzione in turco](../images/gender-bias-translate-en-tr.png)
![Traduzione in inglese](../images/gender-bias-translate-tr-en.png)
### Denigrazione
Una tecnologia di etichettatura delle immagini ha contrassegnato in modo infamante le immagini di persone dalla pelle scura come gorilla. L'etichettatura errata è dannosa non solo perché il sistema ha commesso un errore, ma anche perché ha applicato specificamente un'etichetta che ha una lunga storia di essere intenzionalmente utilizzata per denigrare i neri.
[![AI: Non sono una donna?](https://img.youtube.com/vi/QxuyfWoVV98/0.jpg)](https://www.youtube.com/watch?v=QxuyfWoVV98 "AI, non sono una donna?")
> 🎥 Cliccare sull'immagine sopra per un video: AI, Ain't I a Woman - una performance che mostra il danno causato dalla denigrazione razzista da parte dell'AI
### Sovra o sotto rappresentazione
I risultati di ricerca di immagini distorti possono essere un buon esempio di questo danno. Quando si cercano immagini di professioni con una percentuale uguale o superiore di uomini rispetto alle donne, come l'ingegneria o CEO, si osserva che i risultati sono più fortemente distorti verso un determinato genere.
![Ricerca CEO di Bing](../images/ceos.png)
> Questa ricerca su Bing per "CEO" produce risultati piuttosto inclusivi
Questi cinque principali tipi di danno non si escludono a vicenda e un singolo sistema può presentare più di un tipo di danno. Inoltre, ogni caso varia nella sua gravità. Ad esempio, etichettare ingiustamente qualcuno come criminale è un danno molto più grave che etichettare erroneamente un'immagine. È importante, tuttavia, ricordare che anche danni relativamente non gravi possono far sentire le persone alienate o emarginate e l'impatto cumulativo può essere estremamente opprimente.
**Discussione**: rivisitare alcuni degli esempi e vedere se mostrano danni diversi.
| | Allocatione | Qualita di servizio | Stereotipo | Denigrazione | Sovra o sotto rappresentazione |
| ----------------------------------- | :---------: | :-----------------: | :--------: | :----------: | :----------------------------: |
| Sistema di assunzione automatizzato | x | x | x | | x |
| Traduzione automatica | | | | | |
| Eitchettatura foto | | | | | |
## Rilevare l'ingiustizia
Ci sono molte ragioni per cui un dato sistema si comporta in modo scorretto. I pregiudizi sociali, ad esempio, potrebbero riflettersi nell'insieme di dati utilizzati per addestrarli. Ad esempio, l'ingiustizia delle assunzioni potrebbe essere stata esacerbata dall'eccessivo affidamento sui dati storici. Utilizzando i modelli nei curricula inviati all'azienda per un periodo di 10 anni, il modello ha determinato che gli uomini erano più qualificati perché la maggior parte dei curricula proveniva da uomini, un riflesso del passato dominio maschile nell'industria tecnologica.
Dati inadeguati su un determinato gruppo di persone possono essere motivo di ingiustizia. Ad esempio, i classificatori di immagini hanno un tasso di errore più elevato per le immagini di persone dalla pelle scura perché le tonalità della pelle più scure sono sottorappresentate nei dati.
Anche le ipotesi errate fatte durante lo sviluppo causano iniquità. Ad esempio, un sistema di analisi facciale destinato a prevedere chi commetterà un crimine basato sulle immagini dei volti delle persone può portare a ipotesi dannose. Ciò potrebbe portare a danni sostanziali per le persone classificate erroneamente.
## Si comprendano i propri modelli e si costruiscano in modo onesto
Sebbene molti aspetti dell'equità non vengano catturati nelle metriche di equità quantitativa e non sia possibile rimuovere completamente i pregiudizi da un sistema per garantire l'equità, si è comunque responsabili di rilevare e mitigare il più possibile i problemi di equità.
Quando si lavora con modelli di machine learning, è importante comprendere i propri modelli assicurandone l'interpretabilità e valutando e mitigando l'ingiustizia.
Si utilizza l'esempio di selezione del prestito per isolare il caso e determinare il livello di impatto di ciascun fattore sulla previsione.
## Metodi di valutazione
1. **Identificare i danni (e benefici)**. Il primo passo è identificare danni e benefici. Si pensi a come azioni e decisioni possono influenzare sia i potenziali clienti che un'azienda stessa.
1. **Identificare i gruppi interessati**. Una volta compreso il tipo di danni o benefici che possono verificarsi, identificare i gruppi che potrebbero essere interessati. Questi gruppi sono definiti per genere, etnia o gruppo sociale?
1. **Definire le metriche di equità**. Infine, si definisca una metrica in modo da avere qualcosa su cui misurare il proprio lavoro per migliorare la situazione.
### **Identificare danni (e benefici)**
Quali sono i danni e i benefici associati al prestito? Si pensi agli scenari di falsi negativi e falsi positivi:
**Falsi negativi** (rifiutato, ma Y=1) - in questo caso viene rifiutato un richiedente che sarà in grado di rimborsare un prestito. Questo è un evento avverso perché le risorse dei prestiti non sono erogate a richiedenti qualificati.
**Falsi positivi** (accettato, ma Y=0) - in questo caso, il richiedente ottiene un prestito ma alla fine fallisce. Di conseguenza, il caso del richiedente verrà inviato a un'agenzia di recupero crediti che può influire sulle sue future richieste di prestito.
### **Identificare i gruppi interessati**
Il passo successivo è determinare quali gruppi potrebbero essere interessati. Ad esempio, nel caso di una richiesta di carta di credito, un modello potrebbe stabilire che le donne dovrebbero ricevere limiti di credito molto più bassi rispetto ai loro coniugi che condividono i beni familiari. Un intero gruppo demografico, definito in base al genere, è così interessato.
### **Definire le metriche di equità**
Si sono identificati i danni e un gruppo interessato, in questo caso, delineato per genere. Ora, si usino i fattori quantificati per disaggregare le loro metriche. Ad esempio, utilizzando i dati di seguito, si può vedere che le donne hanno il più alto tasso di falsi positivi e gli uomini il più piccolo, e che è vero il contrario per i falsi negativi.
✅ In una futura lezione sul Clustering, si vedrà come costruire questa 'matrice di confusione' nel codice
| | percentuale di falsi positivi | Percentuale di falsi negativi | conteggio |
| ----------- | ----------------------------- | ----------------------------- | --------- |
| Donna | 0,37 | 0,27 | 54032 |
| Uomo | 0,31 | 0.35 | 28620 |
| Non binario | 0,33 | 0,31 | 1266 |
Questa tabella ci dice diverse cose. Innanzitutto, si nota che ci sono relativamente poche persone non binarie nei dati. I dati sono distorti, quindi si deve fare attenzione a come si interpretano questi numeri.
In questo caso, ci sono 3 gruppi e 2 metriche. Quando si pensa a come il nostro sistema influisce sul gruppo di clienti con i loro richiedenti di prestito, questo può essere sufficiente, ma quando si desidera definire un numero maggiore di gruppi, è possibile distillare questo in insiemi più piccoli di riepiloghi. Per fare ciò, si possono aggiungere più metriche, come la differenza più grande o il rapporto più piccolo di ogni falso negativo e falso positivo.
✅ Ci si fermi a pensare: quali altri gruppi potrebbero essere interessati dalla richiesta di prestito?
## Mitigare l'ingiustizia
Per mitigare l'ingiustizia, si esplori il modello per generare vari modelli mitigati e si confrontino i compromessi tra accuratezza ed equità per selezionare il modello più equo.
Questa lezione introduttiva non approfondisce i dettagli dell'algoritmo della mitigazione dell'ingiustizia, come l'approccio di post-elaborazione e riduzione, ma ecco uno strumento che si potrebbe voler provare.
### Fairlearn
[Fairlearn](https://fairlearn.github.io/) è un pacchetto Python open source che consente di valutare l'equità dei propri sistemi e mitigare l'ingiustizia.
Lo strumento consente di valutare in che modo le previsioni di un modello influiscono su diversi gruppi, consentendo di confrontare più modelli utilizzando metriche di equità e prestazioni e fornendo una serie di algoritmi per mitigare l'ingiustizia nella classificazione binaria e nella regressione.
- Si scopra come utilizzare i diversi componenti controllando il GitHub di [Fairlearn](https://github.com/fairlearn/fairlearn/)
- Si esplori la [guida per l'utente](https://fairlearn.github.io/main/user_guide/index.html), e gli [esempi](https://fairlearn.github.io/main/auto_examples/index.html)
- Si provino alcuni [notebook di esempio](https://github.com/fairlearn/fairlearn/tree/master/notebooks).
- Si scopra [come abilitare le valutazioni dell'equità](https://docs.microsoft.com/azure/machine-learning/how-to-machine-learning-fairness-aml?WT.mc_id=academic-77952-leestott) dei modelli di Machine Learning in Azure Machine Learning.
- Si dia un'occhiata a questi [notebook di esempio](https://github.com/Azure/MachineLearningNotebooks/tree/master/contrib/fairness) per ulteriori scenari di valutazione dell'equità in Azure Machine Learning.
---
## 🚀 Sfida
Per evitare che vengano introdotti pregiudizi, in primo luogo, si dovrebbe:
- avere una diversità di background e prospettive tra le persone che lavorano sui sistemi
- investire in insiemi di dati che riflettano le diversità della società
- sviluppare metodi migliori per rilevare e correggere i pregiudizi quando si verificano
Si pensi a scenari di vita reale in cui l'ingiustizia è evidente nella creazione e nell'utilizzo del modello. Cos'altro si dovrebbe considerare?
## [Quiz post-lezione](https://gray-sand-07a10f403.1.azurestaticapps.net/quiz/6/?loc=it)
## Revisione e Auto Apprendimento
In questa lezione si sono apprese alcune nozioni di base sui concetti di equità e ingiustizia in machine learning.
Si guardi questo workshop per approfondire gli argomenti:
- YouTube: Danni correlati all'equità nei sistemi di IA: esempi, valutazione e mitigazione di Hanna Wallach e Miro Dudik [Danni correlati all'equità nei sistemi di IA: esempi, valutazione e mitigazione - YouTube](https://www.youtube.com/watch?v=1RptHwfkx_k)
Si legga anche:
- Centro risorse RAI di Microsoft: [risorse AI responsabili Microsoft AI](https://www.microsoft.com/ai/responsible-ai-resources?activetab=pivot1%3aprimaryr4)
- Gruppo di ricerca FATE di Microsoft[: FATE: equità, responsabilità, trasparenza ed etica nell'intelligenza artificiale - Microsoft Research](https://www.microsoft.com/research/theme/fate/)
Si esplori il toolkit Fairlearn
[Fairlearn](https://fairlearn.org/)
Si scoprano gli strumenti di Azure Machine Learning per garantire l'equità
- [Azure Machine Learning](https://docs.microsoft.com/azure/machine-learning/concept-fairness-ml?WT.mc_id=academic-77952-leestott)
## Compito
[Esplorare Fairlearn](assignment.it.md)

@ -1,204 +0,0 @@
# 機械学習における公平さ
![機械学習における公平性をまとめたスケッチ](../../../sketchnotes/ml-fairness.png)
> [Tomomi Imura](https://www.twitter.com/girlie_mac)によるスケッチ
## [Pre-lecture quiz](https://gray-sand-07a10f403.1.azurestaticapps.net/quiz/5?loc=ja)
## イントロダクション
このカリキュラムでは、機械学習が私たちの日常生活にどのような影響を与えているかを知ることができます。たった今、医療の診断や不正の検出など、日常の意思決定にシステムやモデルが関わっています。そのため、誰もが公平な結果を得られるようにするためには、これらのモデルがうまく機能することが重要です。
しかし、これらのモデルを構築するために使用しているデータに、人種、性別、政治的見解、宗教などの特定の属性が欠けていたり、そのような属性が偏っていたりすると、何が起こるか想像してみてください。また、モデルの出力が特定の層に有利になるように解釈された場合はどうでしょうか。その結果、アプリケーションはどのような影響を受けるのでしょうか?
このレッスンでは、以下のことを行います:
- 機械学習における公平性の重要性に対する意識を高める。
- 公平性に関連する問題について学ぶ。
- 公平性の評価と緩和について学ぶ。
## 前提条件
前提条件として、"Responsible AI Principles"のLearn Pathを受講し、このトピックに関する以下のビデオを視聴してください。
こちらの[Learning Path](https://docs.microsoft.com/learn/modules/responsible-ai-principles/?WT.mc_id=academic-77952-leestott)より、責任のあるAIについて学ぶ。
[![Microsoftの責任あるAIに対する取り組み](https://img.youtube.com/vi/dnC8-uUZXSc/0.jpg)](https://youtu.be/dnC8-uUZXSc "Microsoftの責任あるAIに対する取り組み")
> 🎥 上の画像をクリックすると動画が表示されますMicrosoftの責任あるAIに対する取り組み
## データやアルゴリズムの不公平さ
> 「データを長く拷問すれば、何でも自白するようになる」 - Ronald Coase
この言葉は極端に聞こえますが、データがどんな結論をも裏付けるように操作できることは事実です。しかし、そのような操作は、時に意図せずに行われることがあります。人間は誰でもバイアスを持っており、自分がいつデータにバイアスを導入しているかを意識的に知ることは難しいことが多いのです。
AIや機械学習における公平性の保証は、依然として複雑な社会技術的課題です。つまり、純粋に社会的な視点や技術的な視点のどちらからも対処できないということです。
### 公平性に関連した問題
不公平とはどういう意味ですか?不公平とは、人種、性別、年齢、障害の有無などで定義された人々のグループに悪影響を与えること、あるいは、被害を与えることです。
主な不公平に関連する問題は以下のように分類されます。:
- **アロケーション**。ある性別や民族が他の性別や民族よりも優遇されている場合。
- **サービスの質**。ある特定のシナリオのためにデータを訓練しても、現実がより複雑な場合にはサービスの質の低下につながります。
- **固定観念**。特定のグループにあらかじめ割り当てられた属性を関連させること。
- **誹謗中傷**。何かや誰かを不当に批判したり、レッテルを貼ること。
- **過剰表現または過小表現**。特定のグループが特定の職業に就いている姿が見られず、それを宣伝し続けるサービスや機能は被害を助長しているという考え。
それでは、いくつか例を見ていきましょう。
### アロケーション
ローン申請を審査する仮想的なシステムを考えてみましょう。このシステムでは、他のグループよりも白人男性を優秀な候補者として選ぶ傾向があります。その結果、特定の申請者にはローンが提供されませんでした。
もう一つの例は、大企業が候補者を審査するために開発した実験的な採用ツールです。このツールは、ある性別に関連する言葉を好むように訓練されたモデルを使って、ある性別をシステム的に差別していました。その結果、履歴書に「女子ラグビーチーム」などの単語が含まれている候補者にペナルティを課すものとなっていました。
✅ ここで、上記のような実例を少し調べてみてください。
### サービスの質
研究者は、いくつかの市販のジェンダー分類法は、明るい肌色の男性の画像と比較して、暗い肌色の女性の画像では高い不正解率を示したことを発見した。[参照](https://www.media.mit.edu/publications/gender-shades-intersectional-accuracy-disparities-in-commercial-gender-classification/)
また、肌の色が暗い人を感知できなかったハンドソープディスペンサーの例も悪い意味で有名です。[参照](https://gizmodo.com/why-cant-this-soap-dispenser-identify-dark-skin-1797931773)
### 固定観念
機械翻訳には、ステレオタイプな性別観が見られます。「彼はナースで、彼女は医者です。(“he is a nurse and she is a doctor”)」という文をトルコ語に翻訳する際、問題が発生しました。トルコ語は単数の三人称を表す代名詞「o」が1つあるのみで、性別の区別のない言語で、この文章をトルコ語から英語に翻訳し直すと、「彼女はナースで、彼は医者です。(“she is a nurse and he is a doctor”)」というステレオタイプによる正しくない文章になってしまいます。
![トルコ語に対する翻訳](../images/gender-bias-translate-en-tr.png)
![英語に復元する翻訳](../images/gender-bias-translate-tr-en.png)
### 誹謗中傷
画像ラベリング技術により、肌の色が黒い人の画像をゴリラと誤表示したことが有名です。誤表示は、システムが単に間違いをしたというだけでなく、黒人を誹謗中傷するためにこの表現が意図的に使われてきた長い歴史を持っていたため、有害である。
[![AI: 自分は女性ではないの?](https://img.youtube.com/vi/QxuyfWoVV98/0.jpg)](https://www.youtube.com/watch?v=QxuyfWoVV98 "AI: 自分は女性ではないの?")
> 🎥 上の画像をクリックすると動画が表示されます: AI: 自分は女性ではないの? - AIによる人種差別的な誹謗中傷による被害を示すパフォーマンス
### 過剰表現または過小表現
異常な画像検索の結果はこの問題の良い例です。エンジニアやCEOなど、男性と女性の割合が同じかそれ以上の職業の画像を検索すると、どちらかの性別に大きく偏った結果が表示されるので注意が必要です。
![BingでCEOと検索](../images/ceos.png)
> Bing での「CEO」の検索結果は包摂的な結果が表示されています
これらの5つの主要なタイプの問題は、相互に排他的なものではなく、1つのシステムが複数のタイプの害を示すこともあります。さらに、それぞれのケースでは、その重大性が異なります。例えば、ある人に不当に犯罪者のレッテルを貼ることは、画像を誤って表示することよりもはるかに深刻な問題です。しかし、比較的深刻ではない被害であっても、人々が疎外感を感じたり、特別視されていると感じたりすることがあり、その累積的な影響は非常に抑圧的なものになりうることを覚えておくことは重要でしょう。
**ディスカッション**: いくつかの例を再検討し、異なる害を示しているかどうかを確認してください。
| | アロケーション | サービスの質 | 固定観念 | 誹謗中傷 | 過剰表現/過小表現 |
| -------------------- | :------------: | :----------: | :------: | :------: | :---------------: |
| 採用システムの自動化 | x | x | x | | x |
| 機械翻訳 | | | | | |
| 写真のラベリング | | | | | |
## 不公平の検出
あるシステムが不公平な動作をする理由はさまざまです。例えば、社会的なバイアスが、学習に使われたデータセットに反映されているかもしれないですし、過去のデータに頼りすぎたために、採用の不公平が悪化したかもしれません。あるモデルは、10年間に会社に提出された履歴書のパターンを利用して、男性からの履歴書が大半を占めていたことから、男性の方が適格であると判断しました。
特定のグループに関するデータが不十分であることも、不公平の原因となります。例えば、肌の色が濃い人のデータが少ないために、画像分類において肌の色が濃い人の画像のエラー率が高くなります。
また、開発時の誤った仮定も不公平の原因となります。例えば、人の顔の画像から犯罪を犯す人を予測することを目的とした顔分析システムでは、有害な推測をしてしまうことがあります。その結果、誤った分類をされた人が大きな被害を受けることになりかねません。
## モデルを理解し、公平性を構築する
公平性の多くの側面は定量的な指標では捉えられず、公平性を保証するためにシステムからバイアスを完全に取り除くことは不可能ですが、公平性の問題を可能な限り検出し、軽減する責任があります。
機械学習モデルを扱う際には、モデルの解釈可能性を保証し、不公平さを評価・軽減することで、モデルを理解することが重要です。
ここでは、ローン選択の例を使ってケースを切り分け、各要素が予測に与える影響の度合いを把握してみましょう。
## 評価方法
1. **危害(と利益)を特定する**。最初のステップは、危害と利益を特定することです。行動や決定が、潜在的な顧客とビジネスそのものの両方にどのような影響を与えるかを考えてみましょう。
1. **影響を受けるグループを特定する**。どのような害や利益が発生しうるかを理解したら、影響を受ける可能性のあるグループを特定します。これらのグループは、性別、民族、または社会的グループによって定義されるでしょうか。
1. **公正さの測定基準を定義する**。最後に、状況を改善する際に何を基準にするかの指標を定義します。
### 有害性(および利益)を特定する
貸与に関連する有害性と利益は何か?偽陰性と偽陽性のシナリオについて考えてみましょう。
**偽陰性認可しないが、Y=1** - この場合、ローンを返済できるであろう申請者が拒否されます。これは、融資が資格のある申請者になされなくなるため、不利な事象となります。
**偽陽性受け入れるが、Y=0** - この場合、申請者は融資を受けたが、最終的には返済不能(デフォルト)になる。その結果、申請者の事例は債権回収会社に送られ、将来のローン申請に影響を与える可能性があります。
### 影響を受けるグループの特定
次のステップでは、どのグループが影響を受ける可能性があるかを判断します。例えば、クレジットカードの申請の場合、家計の資産を共有している配偶者と比較して、女性の与信限度額は大幅に低くすべきだとモデルが判断するかもしれません。これにより、ジェンダーで定義される層全体が影響を受けることになります。
### 公正さの測定基準を定義する
あなたは有害性と影響を受けるグループ(この場合は、性別で定義されている)をここまでに特定しました。次に、定量化された要素を使って、その評価基準を分解します。例えば、以下のデータを使用すると、女性の偽陽性率が最も大きく、男性が最も小さいこと、そしてその逆が偽陰性の場合に当てはまることがわかります。
✅ 今後の"クラスタリング"のレッスンでは、この"混同行列"をコードで構築する方法をご紹介します。
| | 偽陽性率 | 偽陰性率 | サンプル数 |
| ------------------ | -------- | -------- | ---------- |
| 女性 | 0.37 | 0.27 | 54032 |
| 男性 | 0.31 | 0.35 | 28620 |
| どちらにも属さない | 0.33 | 0.31 | 1266 |
この表から、いくつかのことがわかります。まず、データに含まれる男性と女性どちらでもない人が比較的少ないことがわかります。従ってこのデータは歪んでおり、この数字をどう解釈するかに注意が必要です。
今回の場合、3つのグループと2つの指標があります。このシステムがローン申請者であるお客様のグループにどのような影響を与えるかを考えるときにはこれで十分かもしれません。しかし、より多くのグループを定義したい場合は、これをより小さな要約のまとまりに抽出したいと思うかもしれません。そのためには、偽陰性と偽陽性それぞれの最大値の差や最小の比率など、より多くの要素を追加することができます。
✅ 一旦ここで考えてみてください:ローン申請の際に影響を受けそうな他のグループは?
## 不公平の緩和
不公平を緩和するためには、モデルを探索して様々な緩和モデルを生成し、精度と公平性の間で行うトレードオフを比較して、最も公平なモデルを選択します。
この入門編では、後処理やリダクションのアプローチといったアルゴリズムによる不公平の緩和の詳細については深く触れていませんが、試していきたいツールをここで紹介します。
### Fairlearn
[Fairlearn](https://fairlearn.github.io/)はオープンソースのPythonパッケージで、システムの公平性を評価し、不公平を緩和することができます。
このツールは、モデルの予測が異なるグループにどのような影響を与えるかを評価し、公平性とパフォーマンスの指標を用いて複数のモデルを比較することを可能にし、二項分類(binary classification)と回帰(regression)における不公平さを緩和するためのアルゴリズムを提供します。
- Fairlearnの[GitHub](https://github.com/fairlearn/fairlearn/)では、各要素の使用方法を紹介しています。
- [ユーザーガイド](https://fairlearn.github.io/main/user_guide/index.html)、[サンプル](https://fairlearn.github.io/main/auto_examples/index.html)を見る。
- [サンプルノートブック](https://github.com/fairlearn/fairlearn/tree/master/notebooks)を試す。
- Azure Machine Learningで機械学習モデルの[公平性評価を可能にする方法](https://docs.microsoft.com/azure/machine-learning/how-to-machine-learning-fairness-aml?WT.mc_id=academic-77952-leestott)を学ぶ。
- Azure Machine Learningで[サンプルノートブック](https://github.com/Azure/MachineLearningNotebooks/tree/master/contrib/fairness)をチェックして、公平性評価の流れを確認する。
---
## 🚀 Challenge
そもそも偏りが生じないようにするためには、次のようなことが必要です。
- システムに携わる人たちの背景や考え方を多様化する。
- 社会の多様性を反映したデータセットに投資する。
- バイアスが発生したときに、それを検知して修正するためのより良い方法を開発する。
モデルの構築や使用において、不公平が明らかになるような現実のシナリオを考えてみてください。他にどのようなことを考えるべきでしょうか?
## [Post-lecture quiz](https://gray-sand-07a10f403.1.azurestaticapps.net/quiz/6?loc=ja)
## Review & Self Study
このレッスンでは、機械学習における公平、不公平の概念の基礎を学びました。
このワークショップを見て、トピックをより深く理解してください:
- YouTube: AIシステムにおける公平性に関連した被害: Hanna Wallach、Miro Dudikによる、事例、評価、緩和策について[AIシステムにおける公平性に関連した被害: Hanna Wallach、Miro Dudikによる、事例、評価、緩和策について - YouTube](https://www.youtube.com/watch?v=1RptHwfkx_k)
- MicrosoftのRAIリソースセンター: [責任あるAIリソース Microsoft AI](https://www.microsoft.com/ai/responsible-ai-resources?activetab=pivot1%3aprimaryr4)
- MicrosoftのFATE研究グループ: [AIにおけるFATE: Fairness公平性, Accountability説明責任, Transparency透明性, and Ethics倫理- Microsoft Research](https://www.microsoft.com/research/theme/fate/)
Fairlearnのツールキットを調べてみましょう
- [Fairlearn](https://fairlearn.org/)
Azure Machine Learningによる、公平性を確保するためのツールについて読む
- [Azure Machine Learning](https://docs.microsoft.com/azure/machine-learning/concept-fairness-ml?WT.mc_id=academic-77952-leestott)
## 課題
[Fairlearnを調査する](./assignment.ja.md)

@ -1,214 +0,0 @@
# 머신러닝의 공정성
![Summary of Fairness in Machine Learning in a sketchnote](../../../sketchnotes/ml-fairness.png)
> Sketchnote by [Tomomi Imura](https://www.twitter.com/girlie_mac)
## [강의 전 퀴즈](https://gray-sand-07a10f403.1.azurestaticapps.net/quiz/5/)
## 소개
이 커리큘럼에서, 머신러닝이 우리의 생활에 어떻게 영향을 미칠 수 있는 지 알아보겠습니다. 지금도, 시스템과 모델은 건강 관리 진단 또는 사기 탐지와 같이 일상의 의사-결정 작업에 관여하고 있습니다. 따라서 모두에게 공정한 결과를 주기 위해서는 모델이 잘 작동하는게 중요합니다.
모델을 구축할 때 사용하는 데이터에 인종, 성별, 정치적 관점, 종교와 같이 특정 인구 통계가 부족하거나 불균형하게 나타내는 경우, 어떤 일이 발생할 지 상상해봅시다. 모델의 결과가 일부 인구 통계에 유리하도록 해석하는 경우는 어떨까요? 애플리케이션의 결과는 어떨까요?
이 강의에서, 아래 내용을 합니다:
- 머신러닝에서 공정성의 중요도에 대한 인식을 높입니다.
- 공정성-관련 피해에 대하여 알아봅니다.
- 불공정성 평가와 완화에 대하여 알아봅니다.
## 전제 조건
전제 조건으로, "Responsible AI Principles" 학습 과정을 수강하고 주제에 대한 영상을 시청합니다:
[Learning Path](https://docs.microsoft.com/learn/modules/responsible-ai-principles/?WT.mc_id=academic-77952-leestott)를 따라서 Responsible AI에 대하여 더 자세히 알아보세요
[![Microsoft's Approach to Responsible AI](https://img.youtube.com/vi/dnC8-uUZXSc/0.jpg)](https://youtu.be/dnC8-uUZXSc "Microsoft's Approach to Responsible AI")
> 🎥 영상을 보려면 이미지 클릭: Microsoft's Approach to Responsible AI
## 데이터와 알고리즘의 불공정성
> "If you torture the data long enough, it will confess to anything" - Ronald Coase
이 소리는 극단적이지만, 결론을 돕기 위하여 데이터를 조작할 수 있다는 건 사실입니다. 의도치않게 발생할 수 있습니다. 사람으로서, 모두 편견을 가지고 있고, 데이터에 편향적일 때 의식하는 것은 어렵습니다.
AI와 머신러닝의 공정성을 보장하는 건 계속 복잡한 사회기술적 도전 과제로 남고 있습니다. 순수하게 사화나 기술 관점에서 다룰 수 없다고 의미합니다.
### 공정성-관련 피해
불공정이란 무엇일까요? "Unfairness"은 인종, 성별, 나이, 또는 장애 등급으로 정의된 인구 그룹에 대한 부정적인 영향 혹은 "harms"를 포함합니다.
공정성-관련된 주요 피해는 다음처럼 분류할 수 있습니다:
- **할당**, 예를 들자면 성별이나 인종이 다른 사람들보다 선호되는 경우
- **서비스 품질**. 복잡할 때 한 특정 시나리오에 맞춰 데이터를 훈련하면, 서비스 성능이 낮아집니다.
- **고정관념**. 지정된 그룹을 사전에 할당한 속성에 넘깁니다.
- **명예훼손**. 무언가 누군가 부당한 비판하고 라벨링합니다.
- **과도- 또는 과소- 평가**. 아이디어는 특정 공언에서 그룹을 볼 수 없으며, 같이 피해를 입히는 서비스 혹은 기능을 꾸준히 홍보합니다.
이 예시를 보겠습니다.
### 할당
대출 심사하는 가상의 시스템을 생각해보세요. 이 시스템은 백인 남성을 다른 그룹보다 더 선택하는 경향이 있습니다. 결과적으로, 특정 지원자는 대출이 미뤄집니다.
또 다른 예시로는 후보를 뽑기 위해 대기업에서 개발한 실험적 채용 도구입니다. 모델을 사용하여 다른 단어를 선호하도록 훈련하며 하나의 성별을 특정할 수 있는 도구입니다. 이력서에 "womens rugby team" 같은 단어가 포함되면 패널티가 주어졌습니다.
✅ 이런 실제 사례를 찾기 위하여 약간 조사해보세요
### 서비스 품질
연구원들은 여러가지 상용 성별 분류기에서 피부가 하얀 남성 이미지와 다르게 피부가 어두운 여성 이미지에서 오류 비율이 더 높다는 것을 발견했습니다. [Reference](https://www.media.mit.edu/publications/gender-shades-intersectional-accuracy-disparities-in-commercial-gender-classification/)
또 다른 이면에는 피부가 어두운 사람을 잘 인식하지 못하는 비누 디스펜서도 있습니다. [Reference](https://gizmodo.com/why-cant-this-soap-dispenser-identify-dark-skin-1797931773)
### 고정관념
기계 번역에서 성별에 대한 고정관념이 발견되었습니다. “he is a nurse and she is a doctor”라고 터키어로 번역할 때, 문제가 발생했습니다. 터키어는 3인칭을 전달하면서 "o"가 하나인 성별을 가지리지 않지만, 터키어에서 영어로 다시 문장을 번역해보면 “she is a nurse and he is a doctor”라는 고정관념과 부정확하게 반환됩니다.
![translation to Turkish](../images/gender-bias-translate-en-tr.png)
![translation back to English](../images/gender-bias-translate-tr-en.png)
### 명예훼손
이미지 라벨링 기술은 어두운-피부 사람의 이미지를 고릴라로 잘 못 분류했습니다. 잘 못 라벨링된 현상은 denigrate Black people된 오랜 역사를 라벨링하며 적용했으며 시스템이 실수했을 때 해롭습니다.
[![AI: Ain't I a Woman?](https://img.youtube.com/vi/QxuyfWoVV98/0.jpg)](https://www.youtube.com/watch?v=QxuyfWoVV98 "AI, Ain't I a Woman?")
> 🎥 영상을 보려면 이미지 클릭: AI, Ain't I a Woman - a performance showing the harm caused by racist denigration by AI
### 과도- 또는 과소- 평가
왜곡된 이미지 검색 결과가 이러한 피해의 올바른 예시가 됩니다. 공학, 또는 CEO와 같이, 여자보다 남자가 높거나 비슷한 비율의 직업 이미지를 검색할 때 특정 성별에 대하여 더 치우친 결과를 보여 줍니다.
![Bing CEO search](../images/ceos.png)
> This search on Bing for 'CEO' produces pretty inclusive results
5가지의 주요 피해 타입은 mutually exclusive적이지 않으며, 하나의 시스템이 여러 타입의 피해를 나타낼 수 있습니다. 또한, 각 사례들은 심각성이 다릅니다. 예를 들자면, 누군가 범죄자로 부적절하게 노출하는 것은 이미지를 잘못 보여주는 것보다 더 심한 피해입니다. 그러나, 중요한 점은, 상대적으로 심하지 않은 피해도 사람들이 소외감을 느끼거나 피하게 만들 수 있고 쌓인 영향은 꽤 부담이 될 수 있다는 점입니다.
**토론**: 몇 가지 예시를 다시 보고 다른 피해가 발생했는지 확인해봅시다.
| | Allocation | Quality of service | Stereotyping | Denigration | Over- or under- representation |
| ----------------------- | :--------: | :----------------: | :----------: | :---------: | :----------------------------: |
| Automated hiring system | x | x | x | | x |
| Machine translation | | | | | |
| Photo labeling | | | | | |
## 불공정성 감지
주어진 시스템이 부당하게 동작하는 것은 여러 이유가 존재합니다. 사회 편견을, 예시로 들자면, 훈련에 사용한 데이터셋에 영향을 줄 수 있습니다. 예를 들자면, 채용 불공정성은 이전 데이터에 과하게 의존하여 더욱 악화되었을 가능성이 있습니다. 10년 넘게 회사에 제출된 이력서에서 패턴을 사용했으므로, 이 모델은 대부분의 이력서가 기술업의 과거 지배력을 반영했던 남자가 냈기 때문에 남자가 자격이 있다고 판단했습니다.
특정 그룹의 사람에 대한 적절하지 못한 데이터가 불공정의 이유가 될 수 있습니다. 예를 들자면, 이미지 분류기는 데이터에서 더 어두운 피부 톤를 underrepresented 했으므로 어두운-피부 사람 이미지에 대해 오류 비율이 더 높습니다.
개발하면서 잘 못된 가정을 하면 불공정성을 발생합니다. 예를 들자면, 사람들의 얼굴 이미지를 기반으로 범죄를 저지를 것 같은 사람을 예측하기 위한 얼굴 분석 시스템은 올바르지 못한 가정으로 이어질 수 있습니다. 이는 잘 못 분류된 사람들에게 큰 피해를 줄 수 있습니다.
## 모델 이해하고 공정성 구축하기
공정성의 많은 측면은 정량 공정성 지표에 보이지 않고, 공정성을 보장하기 위하여 시스템에서 편향성을 완전히 제거할 수 없지만, 여전히 공정성 문제를 최대한 파악하고 완화할 책임은 있습니다.
머신러닝 모델을 작업할 때, interpretability를 보장하고 불공정성을 평가하며 완화하여 모델을 이해하는 것이 중요합니다.
대출 선택 예시로 케이스를 분리하고 예측에 대한 각 영향 수준을 파악해보겠습니다.
## 평가 방식
1. **피해 (와 이익) 식별하기**. 첫 단계는 피해와 이익을 식별하는 것입니다. 행동과 결정이 잠재적 고객과 비지니스에 어떻게 영향을 미칠 지 생각해봅니다.
1. **영향받는 그룹 식별하기**. 어떤 종류의 피해나 이익을 발생할 수 있는지 파악했다면, 영향을 받을 수 있는 그룹을 식별합니다. 그룹은 성별, 인종, 또는 사회 집단으로 정의되나요?
1. **공정성 지표 정의하기**. 마지막으로, 지표를 정의하여 상황을 개선할 작업에서 특정할 무언가를 가집니다.
### 피해 (와 이익) 식별하기
대출과 관련한 피해와 이익은 어떤 것일까요? false negatives와 false positive 시나리오로 생각해보세요:
**False negatives** (거절, but Y=1) - 이 케이스와 같은 경우, 대출금을 상환할 수 있는 신청자가 거절됩니다. 자격있는 신청자에게 대출이 보류되기 때문에 불리한 이벤트입니다.
**False positives** (승인, but Y=0) - 이 케이스와 같은 경우, 신청자는 대출을 받지만 상환하지 못합니다. 결론적으로, 신청자의 케이스는 향후 대출에 영향을 미칠 수 있는 채권 추심으로 넘어갑니다.
### 영향 받는 그룹 식별
다음 단계는 영향을 받을 것 같은 그룹을 정의하는 것입니다. 예를 들자면, 신용카드를 신청하는 케이스인 경우, 모델은 여자가 가계 재산을 공유하는 배우자에 비해서 매우 낮은 신용도를 받아야 한다고 결정할 수 있습니다. 성별에 의해서, 정의된 전체 인구 통계가 영향 받습니다.
### 공정성 지표 정의
피해와 영향받는 그룹을 식별했습니다, 이 케이스와 같은 경우에는, 성별로 표기됩니다. 이제, 정량화된 원인으로 지표를 세분화합니다. 예시로, 아래 데이터를 사용하면, false positive 비율은 여자의 비율이 가장 크고 남자의 비율이 가장 낮으며, false negatives에서는 반대됩니다.
✅ Clustering에 대한 향후 강의에서는, 이 'confusion matrix'을 코드로 어떻게 작성하는 지 봅시다
| | False positive rate | False negative rate | count |
| ---------- | ------------------- | ------------------- | ----- |
| Women | 0.37 | 0.27 | 54032 |
| Men | 0.31 | 0.35 | 28620 |
| Non-binary | 0.33 | 0.31 | 1266 |
이 테이블은 몇 가지를 알려줍니다. 먼저, 데이터에 non-binary people이 비교적 적다는 것을 알 수 있습니다. 이 데이터는 왜곡되었으므로, 이런 숫자를 해석하는 것은 조심해야 합니다.
이러한 케이스는, 3개의 그룹과 2개의 지표가 존재합니다. 시스템이 대출 신청자와 함께 소비자 그룹에 어떤 영향을 미치는지 알아볼 때는, 충분할 수 있지만, 더 많은 수의 그룹을 정의하려는 경우, 더 작은 요약 셋으로 추출할 수 있습니다. 이를 위해서, 각 false negative와 false positive의 가장 큰 차이 또는 최소 비율과 같은, 지표를 더 추가할 수 있습니다.
✅ Stop and Think: 대출 신청에 영향을 받을 수 있는 다른 그룹이 있을까요?
## 불공정성 완화
불공정성 완화하려면, 모델을 탐색해서 다양하게 완화된 모델을 만들고 가장 공정한 모델을 선택하기 위하여 정확성과 공정성 사이 트레이드오프해서 비교합니다.
입문 강의에서는 post-processing 및 reductions approach과 같은 알고리즘 불공정성 완화에 대한, 세부적인 사항에 대해 깊게 설명하지 않지만, 여기에서 시도할 수 있는 도구가 있습니다.
### Fairlearn
[Fairlearn](https://fairlearn.github.io/)은 시스템의 공정성을 평가하고 불공정성을 완화할 수있는 오픈소스 Python 패키지입니다.
이 도구는 모델의 예측이 다른 그룹에 미치는 영향을 평가하며 돕고, 공정성과 성능 지표를 사용하여 여러 모델을 비교할 수 있으며, binary classification과 regression의 불공정성을 완화하는 알고리즘 셋을 제공할 수 있습니다.
- Fairlearn's [GitHub](https://github.com/fairlearn/fairlearn/)를 확인하고 다양한 컴포넌트를 어떻게 쓰는 지 알아보기.
- [user guide](https://fairlearn.github.io/main/user_guide/index.html), [examples](https://fairlearn.github.io/main/auto_examples/index.html) 탐색해보기.
- [sample notebooks](https://github.com/fairlearn/fairlearn/tree/master/notebooks) 시도해보기.
- Azure Machine Learning에서 머신러닝 모델의 [how to enable fairness assessments](https://docs.microsoft.com/azure/machine-learning/how-to-machine-learning-fairness-aml?WT.mc_id=academic-77952-leestott) 알아보기.
- Azure Machine Learning에서 더 공정한 평가 시나리오에 대하여 [sample notebooks](https://github.com/Azure/MachineLearningNotebooks/tree/master/contrib/fairness) 확인해보기.
---
## 🚀 도전
편견이 처음부터 들어오는 것을 막으려면, 이렇게 해야 합니다:
- 시스템을 작동하는 사람들 사이 다양한 배경과 관점을 가집니다
- 사회의 다양성을 반영하는 데이터 셋에 투자합니다
- 편향적일 때에 더 좋은 방법을 개발합니다
모델을 구축하고 사용하면서 불공정한 실-생활 시나리오를 생각해보세요. 어떻게 고려해야 하나요?
## [강의 후 퀴즈](https://gray-sand-07a10f403.1.azurestaticapps.net/quiz/6/)
## 검토 & 자기주도 학습
이 강의에서, 머신러닝의 공정성과 불공정성 개념에 대한 몇 가지 기본사항을 배웠습니다.
워크숍을 보고 토픽에 대하여 깊게 알아봅니다:
- YouTube: Fairness-related harms in AI systems: Examples, assessment, and mitigation by Hanna Wallach and Miro Dudik [Fairness-related harms in AI systems: Examples, assessment, and mitigation - YouTube](https://www.youtube.com/watch?v=1RptHwfkx_k)
또한, 읽어봅시다:
- Microsoft의 RAI 리소스 센터: [Responsible AI Resources Microsoft AI](https://www.microsoft.com/ai/responsible-ai-resources?activetab=pivot1%3aprimaryr4)
- Microsoft의 FATE research 그룹: [FATE: Fairness, Accountability, Transparency, and Ethics in AI - Microsoft Research](https://www.microsoft.com/research/theme/fate/)
Fairlearn toolkit 탐색합니다
[Fairlearn](https://fairlearn.org/)
공정성을 보장하기 위한 Azure Machine Learning 도구에 대해 읽어봅시다
- [Azure Machine Learning](https://docs.microsoft.com/azure/machine-learning/concept-fairness-ml?WT.mc_id=academic-77952-leestott)
## 과제
[Explore Fairlearn](../assignment.md)

@ -1,211 +0,0 @@
# Equidade no Machine Learning
![Resumo de imparcialidade no Machine Learning em um sketchnote](../../../sketchnotes/ml-fairness.png)
> Sketchnote por [Tomomi Imura](https://www.twitter.com/girlie_mac)
## [Teste pré-aula](https://gray-sand-07a10f403.1.azurestaticapps.net/quiz/5?loc=ptbr)
## Introdução
Neste curso, você começará a descobrir como o machine learning pode e está impactando nosso dia a dia. Mesmo agora, os sistemas e modelos estão envolvidos nas tarefas diárias de tomada de decisão, como diagnósticos de saúde ou detecção de fraudes. Portanto, é importante que esses modelos funcionem bem para fornecer resultados justos para todos.
Imagine o que pode acontecer quando os dados que você está usando para construir esses modelos não têm certos dados demográficos, como raça, gênero, visão política, religião ou representam desproporcionalmente esses dados demográficos. E quando a saída do modelo é interpretada para favorecer alguns dados demográficos? Qual é a consequência para a aplicação?
Nesta lição, você irá:
- Aumentar sua consciência sobre a importância da imparcialidade no machine learning.
- Aprender sobre danos relacionados à justiça.
- Aprender sobre avaliação e mitigação de injustiças.
## Pré-requisito
Como pré-requisito, siga o Caminho de aprendizagem "Princípios de AI responsável" e assista ao vídeo abaixo sobre o tópico:
Saiba mais sobre a AI responsável seguindo este [Caminho de aprendizagem](https://docs.microsoft.com/learn/modules/responsible-ai-principles/?WT.mc_id=academic-77952-leestott)
[![Abordagem da Microsoft para AI responsável](https://img.youtube.com/vi/dnC8-uUZXSc/0.jpg)](https://youtu.be/dnC8-uUZXSc "Abordagem da Microsoft para AI responsável")
> 🎥 Clique na imagem acima para ver um vídeo: Abordagem da Microsoft para AI responsável
## Injustiça em dados e algoritmos
> "Se você torturar os dados por tempo suficiente, eles confessarão qualquer coisa" - Ronald Coase
Essa afirmação parece extrema, mas é verdade que os dados podem ser manipulados para apoiar qualquer conclusão. Essa manipulação às vezes pode acontecer de forma não intencional. Como humanos, todos nós temos preconceitos e muitas vezes é difícil saber conscientemente quando você está introduzindo preconceitos nos dados.
Garantir a justiça na AI e no machine learning continua sendo um desafio sociotécnico complexo. O que significa que não pode ser abordado de perspectivas puramente sociais ou técnicas.
### Danos relacionados à justiça
O que você quer dizer com injustiça? “Injustiça” abrange impactos negativos, ou “danos”, para um grupo de pessoas, tais como aqueles definidos em termos de raça, sexo, idade ou condição de deficiência.
Os principais danos relacionados à justiça podem ser classificados como:
- **Alocação**, se um gênero ou etnia, por exemplo, for favorecido em relação a outro.
- **Qualidade de serviço**. Se você treinar os dados para um cenário específico, mas a realidade for muito mais complexa, isso levará a um serviço de baixo desempenho.
- **Estereotipagem**. Associar um determinado grupo a atributos pré-atribuídos.
- **Difamação**. Criticar e rotular injustamente algo ou alguém..
- **Excesso ou falta de representação**. A ideia é que determinado grupo não seja visto em determinada profissão, e qualquer serviço ou função que continue promovendo isso está contribuindo para o mal.
Vamos dar uma olhada nos exemplos.
### Alocação
Considere um sistema hipotético para examinar os pedidos de empréstimo. O sistema tende a escolher homens brancos como melhores candidatos em relação a outros grupos. Como resultado, os empréstimos são negados a certos candidatos.
Outro exemplo seria uma ferramenta de contratação experimental desenvolvida por uma grande empresa para selecionar candidatos. A ferramenta discriminou sistematicamente um gênero por meio dos modelos foram treinados para preferir palavras associadas a outro. Isso resultou na penalização de candidatos cujos currículos continham palavras como "time feminino de rúgbi".
✅ Faça uma pequena pesquisa para encontrar um exemplo do mundo real de algo assim
### Qualidade de serviço
Os pesquisadores descobriram que vários classificadores comerciais de gênero apresentavam taxas de erro mais altas em imagens de mulheres com tons de pele mais escuros, em oposição a imagens de homens com tons de pele mais claros. [Referência](https://www.media.mit.edu/publications/gender-shades-intersectional-accuracy-disparities-in-commercial-gender-classification/)
Outro exemplo infame é um distribuidor de sabonete para as mãos que parecia não ser capaz de detectar pessoas com pele escura. [Referência](https://gizmodo.com/why-cant-this-soap-dispenser-identify-dark-skin-1797931773)
### Estereotipagem
Visão de gênero estereotípica foi encontrada na tradução automática. Ao traduzir “ele é enfermeiro e ela médica” para o turco, foram encontrados problemas. Turco é uma língua sem gênero que tem um pronome, “o” para transmitir uma terceira pessoa do singular, mas traduzir a frase de volta do turco para o inglês resulta no estereótipo e incorreto como “ela é uma enfermeira e ele é um médico”.
![translation to Turkish](../images/gender-bias-translate-en-tr.png)
![translation back to English](../images/gender-bias-translate-tr-en.png)
### Difamação
Uma tecnologia de rotulagem de imagens erroneamente rotulou imagens de pessoas de pele escura como gorilas. Rotulagem incorreta é prejudicial não apenas porque o sistema cometeu um erro, pois aplicou especificamente um rótulo que tem uma longa história de ser usado propositalmente para difamar os negros.
[![AI: Não sou uma mulher?](https://img.youtube.com/vi/QxuyfWoVV98/0.jpg)](https://www.youtube.com/watch?v=QxuyfWoVV98 "AI, Não sou uma mulher?")
> 🎥 Clique na imagem acima para ver o vídeo: AI, Não sou uma mulher - uma performance que mostra os danos causados pela difamação racista da AI
### Excesso ou falta de representação
Resultados de pesquisa de imagens distorcidos podem ser um bom exemplo desse dano. Ao pesquisar imagens de profissões com uma porcentagem igual ou maior de homens do que mulheres, como engenharia ou CEO, observe os resultados que são mais inclinados para um determinado gênero.
![Pesquisa de CEO do Bing](../images/ceos.png)
> Esta pesquisa no Bing por 'CEO' produz resultados bastante inclusivos
Esses cinco tipos principais de danos não são mutuamente exclusivos e um único sistema pode exibir mais de um tipo de dano. Além disso, cada caso varia em sua gravidade. Por exemplo, rotular injustamente alguém como criminoso é um dano muito mais grave do que rotular erroneamente uma imagem. É importante, no entanto, lembrar que mesmo danos relativamente não graves podem fazer as pessoas se sentirem alienadas ou isoladas e o impacto cumulativo pode ser extremamente opressor.
**Discussão**: Reveja alguns dos exemplos e veja se eles mostram danos diferentes.
| | Alocação | Qualidade de serviço | Estereótipo | Difamação | Excesso ou falta de representação |
| ----------------------- | :--------: | :----------------: | :----------: | :---------: | :----------------------------: |
| Sistema de contratação automatizado | x | x | x | | x |
| Maquina de tradução | | | | | |
| Rotulagem de fotos | | | | | |
## Detectando injustiça
Existem muitas razões pelas quais um determinado sistema se comporta de maneira injusta. Vieses sociais, por exemplo, podem ser refletidos nos conjuntos de dados usados para treiná-los. Por exemplo, a injustiça na contratação pode ter sido exacerbada pela dependência excessiva de dados históricos. Ao usar os padrões em currículos enviados à empresa ao longo de um período de 10 anos, o modelo determinou que os homens eram mais qualificados porque a maioria dos currículos vinha de homens, um reflexo do domínio masculino anterior na indústria de tecnologia.
Dados inadequados sobre um determinado grupo de pessoas podem ser motivo de injustiça. Por exemplo, os classificadores de imagem têm maior taxa de erro para imagens de pessoas de pele escura porque os tons de pele mais escuros estavam subrepresentados nos dados.
Suposições erradas feitas durante o desenvolvimento também causam injustiça. Por exemplo, um sistema de análise facial destinado a prever quem vai cometer um crime com base em imagens de rostos de pessoas pode levar a suposições prejudiciais. Isso pode levar a danos substanciais para as pessoas classificadas incorretamente.
## Entenda seus modelos e construa com justiça
Embora muitos aspectos de justiça não sejam capturados em métricas de justiça quantitativas e não seja possível remover totalmente o preconceito de um sistema para garantir a justiça, você ainda é responsável por detectar e mitigar os problemas de justiça tanto quanto possível.
Quando você está trabalhando com modelos de machine learning, é importante entender seus modelos por meio de garantir sua interpretabilidade e avaliar e mitigar injustiças.
Vamos usar o exemplo de seleção de empréstimo para isolar o caso e descobrir o nível de impacto de cada fator na previsão.
## Métodos de avaliação
1. **Identifique os danos (e benefícios)**. O primeiro passo é identificar danos e benefícios. Pense em como as ações e decisões podem afetar os clientes em potencial e a própria empresa.
2. **Identifique os grupos afetados**. Depois de entender que tipo de danos ou benefícios podem ocorrer, identifique os grupos que podem ser afetados. Esses grupos são definidos por gênero, etnia ou grupo social?
3. **Defina métricas de justiça**. Por fim, defina uma métrica para que você tenha algo para comparar em seu trabalho para melhorar a situação.
### Identificar danos (e benefícios)
Quais são os danos e benefícios associados aos empréstimos? Pense em falsos negativos e cenários de falsos positivos:
**Falsos negativos** (rejeitar, mas Y=1) - neste caso, um candidato que será capaz de reembolsar um empréstimo é rejeitado. Este é um evento adverso porque os recursos dos empréstimos são retidos de candidatos qualificados.
**Falsos positivos** ((aceitar, mas Y=0) - neste caso, o requerente obtém um empréstimo, mas acaba inadimplente. Como resultado, o caso do requerente será enviado a uma agência de cobrança de dívidas que pode afetar seus futuros pedidos de empréstimo.
### Identificar grupos afetados
A próxima etapa é determinar quais grupos provavelmente serão afetados. Por exemplo, no caso de um pedido de cartão de crédito, um modelo pode determinar que as mulheres devem receber limites de crédito muito mais baixos em comparação com seus cônjuges que compartilham bens domésticos. Todo um grupo demográfico, definido por gênero, é assim afetado.
### Definir métricas de justiça
Você identificou danos e um grupo afetado, neste caso, delineado por gênero. Agora, use os fatores quantificados para desagregar suas métricas. Por exemplo, usando os dados abaixo, você pode ver que as mulheres têm a maior taxa de falsos positivos e os homens a menor, e que o oposto é verdadeiro para falsos negativos.
✅ Em uma lição futura sobre Clustering, você verá como construir esta 'matriz de confusão' no código
| | Taxa de falsos positivos | Taxa de falsos negativos | contagem |
| ---------- | ------------------- | ------------------- | ----- |
| Mulheres | 0.37 | 0.27 | 54032 |
| Homens | 0.31 | 0.35 | 28620 |
| Não binário | 0.33 | 0.31 | 1266 |
Esta tabela nos diz várias coisas. Primeiro, notamos que existem comparativamente poucas pessoas não binárias nos dados. Os dados estão distorcidos, então você precisa ter cuidado ao interpretar esses números.
Nesse caso, temos 3 grupos e 2 métricas. Quando estamos pensando em como nosso sistema afeta o grupo de clientes com seus solicitantes de empréstimos, isso pode ser suficiente, mas quando você deseja definir um número maior de grupos, pode destilar isso em conjuntos menores de resumos. Para fazer isso, você pode adicionar mais métricas, como a maior diferença ou menor proporção de cada falso negativo e falso positivo.
✅ Pare e pense: Que outros grupos provavelmente serão afetados pelo pedido de empréstimo?
## Mitigando a injustiça
Para mitigar a injustiça, explore o modelo para gerar vários modelos mitigados e compare as compensações que ele faz entre precisão e justiça para selecionar o modelo mais justo.
Esta lição introdutória não se aprofunda nos detalhes da mitigação de injustiça algorítmica, como pós-processamento e abordagem de reduções, mas aqui está uma ferramenta que você pode querer experimentar.
### Fairlearn
[Fairlearn](https://fairlearn.github.io/) is an open-source Python package that allows you to assess your systems' fairness and mitigate unfairness.
The tool helps you to assesses how a model's predictions affect different groups, enabling you to compare multiple models by using fairness and performance metrics, and supplying a set of algorithms to mitigate unfairness in binary classification and regression.
- Learn how to use the different components by checking out the Fairlearn's [GitHub](https://github.com/fairlearn/fairlearn/)
- Explore the [user guide](https://fairlearn.github.io/main/user_guide/index.html), [examples](https://fairlearn.github.io/main/auto_examples/index.html)
- Try some [sample notebooks](https://github.com/fairlearn/fairlearn/tree/master/notebooks).
- Learn [how to enable fairness assessments](https://docs.microsoft.com/azure/machine-learning/how-to-machine-learning-fairness-aml?WT.mc_id=academic-77952-leestott) of machine learning models in Azure Machine Learning.
- Check out these [sample notebooks](https://github.com/Azure/MachineLearningNotebooks/tree/master/contrib/fairness) for more fairness assessment scenarios in Azure Machine Learning.
---
## 🚀 Desafio
Para evitar que preconceitos sejam introduzidos em primeiro lugar, devemos:
- têm uma diversidade de experiências e perspectivas entre as pessoas que trabalham em sistemas
- investir em conjuntos de dados que reflitam a diversidade de nossa sociedade
- desenvolver melhores métodos para detectar e corrigir preconceitos quando eles ocorrem
Pense em cenários da vida real onde a injustiça é evidente na construção e uso de modelos. O que mais devemos considerar?
## [Questionário pós-aula](https://gray-sand-07a10f403.1.azurestaticapps.net/quiz/6?loc=ptbr)
## Revisão e Autoestudo
Nesta lição, você aprendeu alguns conceitos básicos de justiça e injustiça no aprendizado de máquina.
Assista a este workshop para se aprofundar nos tópicos:
- YouTube: danos relacionados à imparcialidade em sistemas de AI: exemplos, avaliação e mitigação por Hanna Wallach e Miro Dudik [Danos relacionados à imparcialidade em sistemas de AI: exemplos, avaliação e mitigação - YouTube](https://www.youtube.com/watch?v=1RptHwfkx_k)
Além disso, leia:
- Centro de recursos RAI da Microsoft: [Responsible AI Resources Microsoft AI](https://www.microsoft.com/ai/responsible-ai-resources?activetab=pivot1%3aprimaryr4)
- Grupo de pesquisa FATE da Microsoft: [FATE: Equidade, Responsabilidade, Transparência e Ética em IA - Microsoft Research](https://www.microsoft.com/research/theme/fate/)
Explore o kit de ferramentas Fairlearn
[Fairlearn](https://fairlearn.org/)
Leia sobre as ferramentas do Azure Machine Learning para garantir justiça
- [Azure Machine Learning](https://docs.microsoft.com/azure/machine-learning/concept-fairness-ml?WT.mc_id=academic-77952-leestott)
## Tarefa
[Explore Fairlearn](assignment.pt-br.md)

@ -1,214 +0,0 @@
# 机器学习中的公平性
![机器学习中的公平性概述](../../../sketchnotes/ml-fairness.png)
> 作者 [Tomomi Imura](https://www.twitter.com/girlie_mac)
## [课前测验](https://gray-sand-07a10f403.1.azurestaticapps.net/quiz/5/)
## 介绍
在本课程中,你将开始了解机器学习如何影响我们的日常生活。截至目前,系统和模型已经参与到日常决策任务中,例如医疗诊断或发现欺诈。因此,这些模型运行良好,并为每个人提供公平的结果非常重要。
想象一下,当你用于构建这些模型的数据缺少某些人口统计信息时会发生什么情况,例如种族、性别、政治观点、宗教,或者不成比例地代表了这些人口统计信息。当模型的输出被解释为有利于某些人口统计学的时候呢?申请结果如何?
在本课中,你将:
- 提高你对机器学习中公平的重要性的认识。
- 了解公平相关的危害。
- 了解不公平评估和缓解措施。
## 先决条件
作为先决条件,请选择“负责任的人工智能原则”学习路径并观看以下主题视频:
按照此[学习路径](https://docs.microsoft.com/learn/modules/responsible-ai-principles/?WT.mc_id=academic-77952-leestott)了解有关负责任 AI 的更多信息
[![微软对负责任人工智能的做法](https://img.youtube.com/vi/dnC8-uUZXSc/0.jpg)](https://youtu.be/dnC8-uUZXSc "微软对负责任人工智能的做法")
> 🎥 点击上图观看视频:微软对负责任人工智能的做法
## 数据和算法的不公平性
> “如果你折磨数据足够长的时间,它会坦白一切” - Ronald Coase
这种说法听起来很极端,但数据确实可以被操纵以支持任何结论。这种操纵有时可能是无意中发生的。作为人类,我们都有偏见,当你在数据中引入偏见时,往往很难有意识地知道。
保证人工智能和机器学习的公平性仍然是一项复杂的社会技术挑战。这意味着它不能从纯粹的社会或技术角度来解决。
### 与公平相关的危害
你说的不公平是什么意思?“不公平”包括对一群人的负面影响或“伤害”,例如根据种族、性别、年龄或残疾状况定义的那些人。
与公平相关的主要危害可分为:
- **分配**,如果一个性别或种族比另一个更受青睐。
- **服务质量**。 如果你针对一种特定场景训练数据,但实际情况要复杂得多,则会导致服务性能不佳。
- **刻板印象**。 将给定的组与预先分配的属性相关联。
- **诋毁**。 不公平地批评和标记某事或某人。
- **代表性过高或过低**。这种想法是,某个群体在某个行业中不被看到,而这个行业一直在提升,这是造成伤害的原因。
让我们来看看这些例子。
### 分配
考虑一个用于筛选贷款申请的假设系统。该系统倾向于选择白人男性作为比其他群体更好的候选人。因此,某些申请人的贷款被拒。
另一个例子是一家大型公司开发的一种实验性招聘工具,用于筛选应聘者。通过使用这些模型,该工具系统地歧视了一种性别,并被训练为更喜欢与另一种性别相关的词。这导致了对简历中含有“女子橄榄球队”等字样的候选人的不公正地对待。
✅ 做一点研究,找出一个真实的例子
### 服务质量
研究人员发现,与肤色较浅的男性相比,一些商业性的性别分类工具在肤色较深的女性图像上的错误率更高。[参考](https://www.media.mit.edu/publications/gender-shades-intersectional-accuracy-disparities-in-commercial-gender-classification/)
另一个臭名昭著的例子是洗手液分配器,它似乎无法感知皮肤黝黑的人。[参考](https://gizmodo.com/why-cant-this-soap-dispenser-identify-dark-skin-1797931773)
### 刻板印象
机器翻译中存在着刻板的性别观。在将“他是护士她是医生”翻译成土耳其语时遇到了一些问题。土耳其语是一种无性别的语言它有一个代词“o”来表示单数第三人称但把这个句子从土耳其语翻译成英语会产生“她是护士他是医生”这样的刻板印象和错误。
![翻译成土耳其语](../images/gender-bias-translate-en-tr.png)
![翻译成英语](../images/gender-bias-translate-tr-en.png)
### 诋毁
一种图像标记技术,臭名昭著地将深色皮肤的人的图像错误地标记为大猩猩。错误的标签是有害的,不仅仅是因为这个系统犯了一个错误,而且它还特别使用了一个长期以来被故意用来诋毁黑人的标签。
[![AI: 我不是女人吗?](https://img.youtube.com/vi/QxuyfWoVV98/0.jpg)](https://www.youtube.com/watch?v=QxuyfWoVV98 "AI, 我不是女人吗?")
> 🎥 点击上图观看视频AI我不是女人吗 - 一场展示 AI 种族主义诋毁造成的伤害的表演
### 代表性过高或过低
有倾向性的图像搜索结果就是一个很好的例子。在搜索男性比例等于或高于女性的职业的图片时,比如工程或首席执行官,要注意那些更倾向于特定性别的结果。
![必应CEO搜索](../images/ceos.png)
> 在 Bing 上搜索“CEO”会得到非常全面的结果
这五种主要类型的危害不是相互排斥的,一个单一的系统可以表现出一种以上的危害。此外,每个案例的严重程度各不相同。例如,不公平地给某人贴上罪犯的标签比给形象贴上错误的标签要严重得多。然而,重要的是要记住,即使是相对不严重的伤害也会让人感到疏远或被孤立,累积的影响可能会非常压抑。
**讨论**:重温一些例子,看看它们是否显示出不同的危害。
| | 分配 | 服务质量 | 刻板印象 | 诋毁 | 代表性过高或过低 |
| ------------ | :---: | :------: | :------: | :---: | :--------------: |
| 自动招聘系统 | x | x | x | | x |
| 机器翻译 | | | | | |
| 照片加标签 | | | | | |
## 检测不公平
给定系统行为不公平的原因有很多。例如,社会偏见可能会反映在用于训练它们的数据集中。例如,过度依赖历史数据可能会加剧招聘不公平。通过使用过去 10 年提交给公司的简历中的模式,该模型确定男性更合格,因为大多数简历来自男性,这反映了过去男性在整个科技行业的主导地位。
关于特定人群的数据不足可能是不公平的原因。例如,图像分类器对于深肤色人的图像具有较高的错误率,因为数据中没有充分代表较深的肤色。
开发过程中做出的错误假设也会导致不公平。例如,旨在根据人脸图像预测谁将犯罪的面部分析系统可能会导致破坏性假设。这可能会对错误分类的人造成重大伤害。
## 了解你的模型并建立公平性
尽管公平性的许多方面都没有包含在量化公平性指标中,并且不可能从系统中完全消除偏见以保证公平性,但你仍然有责任尽可能多地检测和缓解公平性问题。
当你使用机器学习模型时,通过确保模型的可解释性以及评估和减轻不公平性来理解模型非常重要。
让我们使用贷款选择示例来作为分析案例,以确定每个因素对预测的影响程度。
## 评价方法
1. **识别危害(和好处)**。第一步是找出危害和好处。思考行动和决策如何影响潜在客户和企业本身。
2. **确定受影响的群体**。一旦你了解了什么样的伤害或好处可能会发生,找出可能受到影响的群体。这些群体是按性别、种族或社会群体界定的吗?
3. **定义公平性度量**。最后,定义一个度量标准,这样你就可以在工作中衡量一些东西来改善这种情况。
### 识别危害(和好处)
与贷款相关的危害和好处是什么?想想假阴性和假阳性的情况:
**假阴性**(拒绝,但 Y=1-在这种情况下,将拒绝有能力偿还贷款的申请人。这是一个不利的事件,因为贷款的资源是从合格的申请人扣留。
**假阳性**(接受,但 Y=0-在这种情况下,申请人确实获得了贷款,但最终违约。因此,申请人的案件将被送往一个债务催收机构,这可能会影响他们未来的贷款申请。
### 确定受影响的群体
下一步是确定哪些群体可能受到影响。例如,在信用卡申请的情况下,模型可能会确定女性应获得比共享家庭资产的配偶低得多的信用额度。因此,由性别定义的整个人口统计数据都会受到影响。
### 定义公平性度量
你已经确定了伤害和受影响的群体,在本例中,是按性别划分的。现在,使用量化因子来分解它们的度量。例如,使用下面的数据,你可以看到女性的假阳性率最大,男性的假阳性率最小,而对于假阴性则相反。
✅ 在以后关于聚类的课程中,你将看到如何在代码中构建这个“混淆矩阵”
| | 假阳性率 | 假阴性率 | 数量 |
| ---------- | -------- | -------- | ----- |
| 女性 | 0.37 | 0.27 | 54032 |
| 男性 | 0.31 | 0.35 | 28620 |
| 未列出性别 | 0.33 | 0.31 | 1266 |
这个表格告诉我们几件事。首先,我们注意到数据中的未列出性别的人相对较少。数据是有偏差的,所以你需要小心解释这些数字。
在本例中,我们有 3 个组和 2 个度量。当我们考虑我们的系统如何影响贷款申请人的客户群时,这可能就足够了,但是当你想要定义更多的组时,你可能需要将其提取到更小的摘要集。为此,你可以添加更多的度量,例如每个假阴性和假阳性的最大差异或最小比率。
✅ 停下来想一想:还有哪些群体可能会受到贷款申请的影响?
## 减轻不公平
为了缓解不公平,探索模型生成各种缓解模型,并比较其在准确性和公平性之间的权衡,以选择最公平的模型。
这个介绍性的课程并没有深入探讨算法不公平缓解的细节,比如后处理和减少方法,但是这里有一个你可能想尝试的工具。
### Fairlearn
[Fairlearn](https://fairlearn.github.io/) 是一个开源 Python 包,可让你评估系统的公平性并减轻不公平性。
该工具可帮助你评估模型的预测如何影响不同的组,使你能够通过使用公平性和性能指标来比较多个模型,并提供一组算法来减轻二元分类和回归中的不公平性。
- 通过查看 Fairlearn 的 [GitHub](https://github.com/fairlearn/fairlearn/) 了解如何使用不同的组件
- 浏览[用户指南](https://fairlearn.github.io/main/user_guide/index.html), [示例](https://fairlearn.github.io/main/auto_examples/index.html)
- 尝试一些 [示例 Notebook](https://github.com/fairlearn/fairlearn/tree/master/notebooks).
- 了解Azure机器学习中机器学习模型[如何启用公平性评估](https://docs.microsoft.com/azure/machine-learning/how-to-machine-learning-fairness-aml?WT.mc_id=academic-77952-leestott)。
- 看看这些[示例 Notebook](https://github.com/Azure/MachineLearningNotebooks/tree/master/contrib/fairness)了解 Azure 机器学习中的更多公平性评估场景。
---
## 🚀 挑战
为了防止首先引入偏见,我们应该:
- 在系统工作人员中有不同的背景和观点
- 获取反映我们社会多样性的数据集
- 开发更好的方法来检测和纠正偏差
想想现实生活中的场景,在模型构建和使用中明显存在不公平。我们还应该考虑什么?
## [课后测验](https://gray-sand-07a10f403.1.azurestaticapps.net/quiz/6/)
## 复习与自学
在本课中,你学习了机器学习中公平和不公平概念的一些基础知识。
观看本次研讨会,深入探讨以下主题:
- YouTube:人工智能系统中与公平相关的危害:示例、评估和缓解 Hanna Wallach 和 Miro Dudik[人工智能系统中与公平相关的危害:示例、评估和缓解-YouTube](https://www.youtube.com/watch?v=1RptHwfkx_k)
另外,请阅读:
- 微软RAI资源中心[负责人工智能资源-微软人工智能](https://www.microsoft.com/ai/responsible-ai-resources?activetab=pivot1%3aprimaryr4)
- 微软 FATE 研究小组:[FATEAI 中的公平、问责、透明和道德-微软研究院](https://www.microsoft.com/research/theme/fate/)
探索 Fairlearn 工具箱
[Fairlearn](https://fairlearn.org/)
了解 Azure 机器学习的工具以确保公平性
- [Azure 机器学习](https://docs.microsoft.com/azure/machine-learning/concept-fairness-ml?WT.mc_id=academic-77952-leestott)
## 任务
[探索 Fairlearn](assignment.zh-cn.md)

@ -1,209 +0,0 @@
# 機器學習中的公平性
![機器學習中的公平性概述](../../../sketchnotes/ml-fairness.png)
> 作者 [Tomomi Imura](https://www.twitter.com/girlie_mac)
## [課前測驗](https://gray-sand-07a10f403.1.azurestaticapps.net/quiz/5/)
## 介紹
在本課程中,你將開始了解機器學習如何影響我們的日常生活。截至目前,系統和模型已經參與到日常決策任務中,例如醫療診斷或發現欺詐。因此,這些模型運行良好,並為每個人提供公平的結果非常重要。
想象一下,當你用於構建這些模型的數據缺少某些人口統計信息時會發生什麽情況,例如種族、性別、政治觀點、宗教,或者不成比例地代表了這些人口統計信息。當模型的輸出被解釋為有利於某些人口統計學的時候呢?申請結果如何?
在本課中,你將:
- 提高你對機器學習中公平的重要性的認識。
- 了解公平相關的危害。
- 了解不公平評估和緩解措施。
## 先決條件
作為先決條件,請選擇「負責任的人工智能原則」學習路徑並觀看以下主題視頻:
按照此[學習路徑](https://docs.microsoft.com/learn/modules/responsible-ai-principles/?WT.mc_id=academic-77952-leestott)了解有關負責任 AI 的更多信息
[![微軟對負責任人工智能的做法](https://img.youtube.com/vi/dnC8-uUZXSc/0.jpg)](https://youtu.be/dnC8-uUZXSc "微軟對負責任人工智能的做法")
> 🎥 點擊上圖觀看視頻:微軟對負責任人工智能的做法
## 數據和算法的不公平性
> 「如果你折磨數據足夠長的時間,它會坦白一切」 - Ronald Coase
這種說法聽起來很極端,但數據確實可以被操縱以支持任何結論。這種操縱有時可能是無意中發生的。作為人類,我們都有偏見,當你在數據中引入偏見時,往往很難有意識地知道。
保證人工智能和機器學習的公平性仍然是一項復雜的社會技術挑戰。這意味著它不能從純粹的社會或技術角度來解決。
### 與公平相關的危害
你說的不公平是什麽意思?「不公平」包括對一群人的負面影響或「傷害」,例如根據種族、性別、年齡或殘疾狀況定義的那些人。
與公平相關的主要危害可分為:
- **分配**,如果一個性別或種族比另一個更受青睞。
- **服務質量**。 如果你針對一種特定場景訓練數據,但實際情況要復雜得多,則會導致服務性能不佳。
- **刻板印象**。 將給定的組與預先分配的屬性相關聯。
- **詆毀**。 不公平地批評和標記某事或某人。
- **代表性過高或過低**。這種想法是,某個群體在某個行業中不被看到,而這個行業一直在提升,這是造成傷害的原因。
讓我們來看看這些例子。
### 分配
考慮一個用於篩選貸款申請的假設系統。該系統傾向於選擇白人男性作為比其他群體更好的候選人。因此,某些申請人的貸款被拒。
另一個例子是一家大型公司開發的一種實驗性招聘工具,用於篩選應聘者。通過使用這些模型,該工具系統地歧視了一種性別,並被訓練為更喜歡與另一種性別相關的詞。這導致了對簡歷中含有「女子橄欖球隊」等字樣的候選人的不公正地對待。
✅ 做一點研究,找出一個真實的例子
### 服務質量
研究人員發現,與膚色較淺的男性相比,一些商業性的性別分類工具在膚色較深的女性圖像上的錯誤率更高。[參考](https://www.media.mit.edu/publications/gender-shades-intersectional-accuracy-disparities-in-commercial-gender-classification/)
另一個臭名昭著的例子是洗手液分配器,它似乎無法感知皮膚黝黑的人。[參考](https://gizmodo.com/why-cant-this-soap-dispenser-identify-dark-skin-1797931773)
### 刻板印象
機器翻譯中存在著刻板的性別觀。在將「他是護士她是醫生」翻譯成土耳其語時遇到了一些問題。土耳其語是一種無性別的語言它有一個代詞「o」來表示單數第三人稱但把這個句子從土耳其語翻譯成英語會產生「她是護士他是醫生」這樣的刻板印象和錯誤。
![翻譯成土耳其語](../images/gender-bias-translate-en-tr.png)
![翻譯成英語](../images/gender-bias-translate-tr-en.png)
### 詆毀
一種圖像標記技術,臭名昭著地將深色皮膚的人的圖像錯誤地標記為大猩猩。錯誤的標簽是有害的,不僅僅是因為這個系統犯了一個錯誤,而且它還特別使用了一個長期以來被故意用來詆毀黑人的標簽。
[![AI: 我不是女人嗎?](https://img.youtube.com/vi/QxuyfWoVV98/0.jpg)](https://www.youtube.com/watch?v=QxuyfWoVV98 "AI, 我不是女人嗎?")
> 🎥 點擊上圖觀看視頻AI我不是女人嗎 - 一場展示 AI 種族主義詆毀造成的傷害的表演
### 代表性過高或過低
有傾向性的圖像搜索結果就是一個很好的例子。在搜索男性比例等於或高於女性的職業的圖片時,比如工程或首席執行官,要註意那些更傾向於特定性別的結果。
![必應CEO搜索](../images/ceos.png)
> 在 Bing 上搜索「CEO」會得到非常全面的結果
這五種主要類型的危害不是相互排斥的,一個單一的系統可以表現出一種以上的危害。此外,每個案例的嚴重程度各不相同。例如,不公平地給某人貼上罪犯的標簽比給形象貼上錯誤的標簽要嚴重得多。然而,重要的是要記住,即使是相對不嚴重的傷害也會讓人感到疏遠或被孤立,累積的影響可能會非常壓抑。
**討論**:重溫一些例子,看看它們是否顯示出不同的危害。
| | 分配 | 服務質量 | 刻板印象 | 詆毀 | 代表性過高或過低 |
| ------------ | :---: | :------: | :------: | :---: | :--------------: |
| 自動招聘系統 | x | x | x | | x |
| 機器翻譯 | | | | | |
| 照片加標簽 | | | | | |
## 檢測不公平
給定系統行為不公平的原因有很多。例如,社會偏見可能會反映在用於訓練它們的數據集中。例如,過度依賴歷史數據可能會加劇招聘不公平。通過使用過去 10 年提交給公司的簡歷中的模式,該模型確定男性更合格,因為大多數簡歷來自男性,這反映了過去男性在整個科技行業的主導地位。
關於特定人群的數據不足可能是不公平的原因。例如,圖像分類器對於深膚色人的圖像具有較高的錯誤率,因為數據中沒有充分代表較深的膚色。
開發過程中做出的錯誤假設也會導致不公平。例如,旨在根據人臉圖像預測誰將犯罪的面部分析系統可能會導致破壞性假設。這可能會對錯誤分類的人造成重大傷害。
## 了解你的模型並建立公平性
盡管公平性的許多方面都沒有包含在量化公平性指標中,並且不可能從系統中完全消除偏見以保證公平性,但你仍然有責任盡可能多地檢測和緩解公平性問題。
當你使用機器學習模型時,通過確保模型的可解釋性以及評估和減輕不公平性來理解模型非常重要。
讓我們使用貸款選擇示例來作為分析案例,以確定每個因素對預測的影響程度。
## 評價方法
1. **識別危害(和好處)**。第一步是找出危害和好處。思考行動和決策如何影響潛在客戶和企業本身。
2. **確定受影響的群體**。一旦你了解了什麽樣的傷害或好處可能會發生,找出可能受到影響的群體。這些群體是按性別、種族或社會群體界定的嗎?
3. **定義公平性度量**。最後,定義一個度量標準,這樣你就可以在工作中衡量一些東西來改善這種情況。
### 識別危害(和好處)
與貸款相關的危害和好處是什麽?想想假陰性和假陽性的情況:
**假陰性**(拒絕,但 Y=1-在這種情況下,將拒絕有能力償還貸款的申請人。這是一個不利的事件,因為貸款的資源是從合格的申請人扣留。
**假陽性**(接受,但 Y=0-在這種情況下,申請人確實獲得了貸款,但最終違約。因此,申請人的案件將被送往一個債務催收機構,這可能會影響他們未來的貸款申請。
### 確定受影響的群體
下一步是確定哪些群體可能受到影響。例如,在信用卡申請的情況下,模型可能會確定女性應獲得比共享家庭資產的配偶低得多的信用額度。因此,由性別定義的整個人口統計數據都會受到影響。
### 定義公平性度量
你已經確定了傷害和受影響的群體,在本例中,是按性別劃分的。現在,使用量化因子來分解它們的度量。例如,使用下面的數據,你可以看到女性的假陽性率最大,男性的假陽性率最小,而對於假陰性則相反。
✅ 在以後關於聚類的課程中,你將看到如何在代碼中構建這個「混淆矩陣」
| | 假陽性率 | 假陰性率 | 數量 |
| ---------- | -------- | -------- | ----- |
| 女性 | 0.37 | 0.27 | 54032 |
| 男性 | 0.31 | 0.35 | 28620 |
| 未列出性別 | 0.33 | 0.31 | 1266 |
這個表格告訴我們幾件事。首先,我們註意到數據中的未列出性別的人相對較少。數據是有偏差的,所以你需要小心解釋這些數字。
在本例中,我們有 3 個組和 2 個度量。當我們考慮我們的系統如何影響貸款申請人的客戶群時,這可能就足夠了,但是當你想要定義更多的組時,你可能需要將其提取到更小的摘要集。為此,你可以添加更多的度量,例如每個假陰性和假陽性的最大差異或最小比率。
✅ 停下來想一想:還有哪些群體可能會受到貸款申請的影響?
## 減輕不公平
為了緩解不公平,探索模型生成各種緩解模型,並比較其在準確性和公平性之間的權衡,以選擇最公平的模型。
這個介紹性的課程並沒有深入探討算法不公平緩解的細節,比如後處理和減少方法,但是這裏有一個你可能想嘗試的工具。
### Fairlearn
[Fairlearn](https://fairlearn.github.io/) 是一個開源 Python 包,可讓你評估系統的公平性並減輕不公平性。
該工具可幫助你評估模型的預測如何影響不同的組,使你能夠通過使用公平性和性能指標來比較多個模型,並提供一組算法來減輕二元分類和回歸中的不公平性。
- 通過查看 Fairlearn 的 [GitHub](https://github.com/fairlearn/fairlearn/) 了解如何使用不同的組件
- 瀏覽[用戶指南](https://fairlearn.github.io/main/user_guide/index.html), [示例](https://fairlearn.github.io/main/auto_examples/index.html)
- 嘗試一些 [示例 Notebook](https://github.com/fairlearn/fairlearn/tree/master/notebooks).
- 了解Azure機器學習中機器學習模型[如何啟用公平性評估](https://docs.microsoft.com/azure/machine-learning/how-to-machine-learning-fairness-aml?WT.mc_id=academic-77952-leestott)。
- 看看這些[示例 Notebook](https://github.com/Azure/MachineLearningNotebooks/tree/master/contrib/fairness)了解 Azure 機器學習中的更多公平性評估場景。
---
## 🚀 挑戰
為了防止首先引入偏見,我們應該:
- 在系統工作人員中有不同的背景和觀點
- 獲取反映我們社會多樣性的數據集
- 開發更好的方法來檢測和糾正偏差
想想現實生活中的場景,在模型構建和使用中明顯存在不公平。我們還應該考慮什麽?
## [課後測驗](https://gray-sand-07a10f403.1.azurestaticapps.net/quiz/6/)
## 復習與自學
在本課中,你學習了機器學習中公平和不公平概念的一些基礎知識。
觀看本次研討會,深入探討以下主題:
- YouTube:人工智能系統中與公平相關的危害:示例、評估和緩解 Hanna Wallach 和 Miro Dudik[人工智能系統中與公平相關的危害:示例、評估和緩解-YouTube](https://www.youtube.com/watch?v=1RptHwfkx_k)
另外,請閱讀:
- 微軟RAI資源中心[負責人工智能資源-微軟人工智能](https://www.microsoft.com/ai/responsible-ai-resources?activetab=pivot1%3aprimaryr4)
- 微軟 FATE 研究小組:[FATEAI 中的公平、問責、透明和道德-微軟研究院](https://www.microsoft.com/research/theme/fate/)
探索 Fairlearn 工具箱
[Fairlearn](https://fairlearn.org/)
了解 Azure 機器學習的工具以確保公平性
- [Azure 機器學習](https://docs.microsoft.com/azure/machine-learning/concept-fairness-ml?WT.mc_id=academic-77952-leestott)
## 任務
[探索 Fairlearn](assignment.zh-tw.md)

@ -1,11 +0,0 @@
# Explore Fairlearn
## Instrucciones
En esta lección, aprendió sobre Fairlearn, un "proyecto open-source impulsado por la comunidad para ayudar a los científicos de datos a mejorar la equidad de los sistemas de AI." Para esta tarea, explore uno de los [cuadernos](https://fairlearn.org/v0.6.2/auto_examples/index.html) de Fairlearn e informe sus hallazgos en un documento o presentación.
## Rúbrica
| Criterios | Ejemplar | Adecuado | Necesita mejorar |
| -------- | --------- | -------- | ----------------- |
| | Un documento o presentación powerpoint es presentado discutiendo los sistemas de Fairlearn, el cuaderno que fue ejecutado, y las conclusiones extraídas al ejecutarlo | Un documento es presentado sin conclusiones | No se presenta ningún documento |

@ -1,11 +0,0 @@
# Explorez le Fairlearn
## Instructions
Dans cette leçon, vous avez découvert le concept de Fairlearn, un « projet open source géré par la communauté pour aider les data scientists à améliorer l'équité des systèmes d'IA ». Pour ce devoir, explorez l'un des [carnets de Fairlearn](https://fairlearn.org/v0.6.2/auto_examples/index.html) et rapportez vos découvertes dans un article ou une présentation.
## Rubrique
| Critères | Exemplaire | Adéquat | Besoin d'amélioration |
| -------- | --------- | -------- | ----------------- |
| | Une présentation papier ou powerpoint est présentée sur les systèmes de Fairlearn, le bloc-notes qui a été exécuté et les conclusions tirées de son exécution. | Un article est présenté sans conclusions | Aucun papier n'est présenté |

@ -1,11 +0,0 @@
# Jelajahi Fairlearn
## Instruksi
Dalam pelajaran ini kamu telah belajar mengenai Fairlearn, sebuah "proyek *open-source* berbasis komunitas untuk membantu para *data scientist* meningkatkan keadilan dari sistem AI." Untuk penugasan kali ini, jelajahi salah satu dari [notebook](https://fairlearn.org/v0.6.2/auto_examples/index.html) yang disediakan Fairlearn dan laporkan penemuanmu dalam sebuah paper atau presentasi.
## Rubrik
| Kriteria | Sangat Bagus | Cukup | Perlu Peningkatan |
| -------- | --------- | -------- | ----------------- |
| | Sebuah *paper* atau presentasi powerpoint yang membahas sistem Fairlearn, *notebook* yang dijalankan, dan kesimpulan yang diambil dari hasil menjalankannya | Sebuah paper yang dipresentasikan tanpa kesimpulan | Tidak ada paper yang dipresentasikan |

@ -1,11 +0,0 @@
# Esplorare Fairlearn
## Istruzioni
In questa lezione si è appreso di Fairlearn, un "progetto open source guidato dalla comunità per aiutare i data scientist a migliorare l'equità dei sistemi di intelligenza artificiale". Per questo compito, esplorare uno dei [notebook](https://fairlearn.org/v0.6.2/auto_examples/index.html) di Fairlearn e riportare i propri risultati in un documento o in una presentazione.
## Rubrica
| Criteri | Ottimo | Adeguato | Necessita miglioramento |
| -------- | --------- | -------- | ----------------- |
| | Viene presentato un documento o una presentazione powerpoint in cui si discutono i sistemi di Fairlearn, il notebook che è stato eseguito e le conclusioni tratte dall'esecuzione | Viene presentato un documento senza conclusioni | Non viene presentato alcun documento |

@ -1,11 +0,0 @@
# Fairlearnを調査する
## 指示
このレッスンでは、「データサイエンティストがAIシステムの公平性を向上させるための、オープンソースでコミュニティ主導のプロジェクト」であるFairlearnについて学習しました。この課題では、Fairlearnの [ノートブック](https://fairlearn.org/v0.6.2/auto_examples/index.html) のうちのひとつを調査し、わかったことをレポートやプレゼンテーションの形で報告してください。
## 評価基準
| 基準 | 模範的 | 十分 | 要改善 |
| ---- | --------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------- | -------------------------- |
| | Fairlearnのシステム・実行したートブック・実行によって得られた結果が、レポートやパワーポイントのプレゼンテーションとして提示されている | 結論のないレポートが提示されている | レポートが提示されていない |

@ -1,11 +0,0 @@
# Fairlearn에 대해 알아봅시다
## 설명
이번 수업에서는 "데이터 과학자들의 인공지능 시스템들에 대한 공정성을 향상시키기 위한 오픈소스이면서 커뮤니티 중심 프로젝트"인 Fairlearn에 대하여 배워보았습니다. 이 과제에서는 Fairlearn의 [예제 노트북들(Jupyter Notebooks)](https://fairlearn.org/v0.6.2/auto_examples/index.html) 중 하나의 노트북을 선택해 살펴보고, 알게 된 것을 보고서나 파워포인트(PowerPoint) 프레젠테이션으로 발표하면 됩니다.
## 평가기준표
| 평가기준 | 모범 | 적절 | 향상 필요 |
| -------- | --------- | -------- | ----------------- |
| | 보고서나 파워포인트 프레젠테이션(발표 자료)으로 Fairlearn의 시스템들에 대해 논했으며, 노트북을 사용해 결론을 도출함 | 발표 자료에 결론이 없음 | 발표 자료 미제출 |

@ -1,11 +0,0 @@
# Explore Fairlearn
## Instruções
Nesta lição, você aprendeu sobre o Fairlearn, um "projeto de código aberto voltado para a comunidade para ajudar os cientistas de dados a melhorar a justiça dos sistemas de IA". Para esta tarefa, explore um do [notebooks](https://fairlearn.org/v0.6.2/auto_examples/index.html) e relate suas descobertas em um artigo ou apresentação.
## Rubrica
| Critérios | Exemplar | Adapte | Precisa Melhorar |
| -------- | --------- | -------- | ----------------- |
| | Uma apresentação em papel ou em PowerPoint é apresentada discutindo os sistemas da Fairlearn, o bloco de notas que foi executado e as conclusões tiradas de sua execução | Um artigo é apresentado sem conclusões | Nenhum artigo é apresentado |

@ -1,11 +0,0 @@
# 探索 Fairlearn
## 说明
在这节课中,你了解了 Fairlearn一个“开源的社区驱动的项目旨在帮助数据科学家们提高人工智能系统的公平性”。在这项作业中探索 Fairlearn [笔记本](https://fairlearn.org/v0.6.2/auto_examples/index.html)中的一个例子,之后你可以用论文或者 ppt 的形式叙述你学习后的发现。
## 评判标准
| 标准 | 优秀 | 中规中矩 | 仍需努力 |
| -------- | --------- | -------- | ----------------- |
| | 提交了一篇论文或者ppt 关于讨论 Fairlearn 系统、挑选运行的例子、和运行这个例子后所得出来的心得结论 | 提交了一篇没有结论的论文 | 没有提交论文 |

@ -1,11 +0,0 @@
# 探索 Fairlearn
## 說明
在這節課中,你了解了 Fairlearn一個「開源的社區驅動的項目旨在幫助數據科學家們提高人工智能系統的公平性」。在這項作業中探索 Fairlearn [筆記本](https://fairlearn.org/v0.6.2/auto_examples/index.html)中的一個例子,之後你可以用論文或者 ppt 的形式敘述你學習後的發現。
## 評判標準
| 標準 | 優秀 | 中規中矩 | 仍需努力 |
| -------- | --------- | -------- | ----------------- |
| | 提交了一篇論文或者ppt 關於討論 Fairlearn 系統、挑選運行的例子、和運行這個例子後所得出來的心得結論 | 提交了一篇沒有結論的論文 | 沒有提交論文 |

@ -0,0 +1,171 @@
# Postscript: Model Debugging in Machine Learning using Responsible AI dashboard components
## [Pre-lecture quiz](https://gray-sand-07a10f403.1.azurestaticapps.net/quiz/5/)
## Introduction
Machine learning impacts our everyday lives. AI is finding its way into some of the most important systems that affect us as individuals as well as our society, from healthcare, finance, education, and employment. For instance, systems and models are involved in daily decision-making tasks, such as health care diagnoses or detecting fraud. Consequentially, the advancements in AI along with the accelerated adoption are being met with evolving societal expectations and growing regulation in response. We constantly see areas where AI systems continue to miss expectations; they expose new challenges; and governments are starting to regulate AI solutions. So, it is important that these models are analyzed to provide fair, reliable, inclusive, transparent, and accountable outcomes for everyone.
In this curriculum, we will look at practical tools that can be used to assess if a model has responsible AI issues. Traditional machine learning debugging techniques tend to be based on quantitative calculations such as aggregated accuracy or average error loss. Imagine what can happen when the data you are using to build these models lacks certain demographics, such as race, gender, political view, religion, or disproportionally represents such demographics. What about when the model's output is interpreted to favor some demographic? This can introduce an over or under representation of these sensitive feature groups resulting in fairness, inclusiveness, or reliability issues from the model. Another factor is, machine learning models are considered black boxes, which makes it hard to understand and explain what drives a models prediction. All of these are challenges data scientists and AI developers face when they do not have adequate tools to debug and assess the fairness or trustworthiness of a model.
In this lesson, you will learn about debugging your models using:
- **Error Analysis**: identify where in your data distribution the model has high error rates.
- **Model Overview**: perform comparative analysis across different data cohorts to discover disparities in your models performance metrics.
- **Data Analysis**: investigate where there could be over or under representation of your data that can skew your model to favor one data demographic vs another.
- **Feature Importance**: understand which features are driving your models predictions on a global level or local level.
## Prerequisite
As a prerequisite, please take the review [Responsible AI tools for developers](https://www.microsoft.com/ai/ai-lab-responsible-ai-dashboard)
> ![Gif on Responsible AI Tools](./images/rai-overview.gif)
## Error Analysis
Traditional model performance metrics used for measuring accuracy are mostly calculations based on correct vs incorrect predictions. For example, determining that a model is accurate 89% of time with an error loss of 0.001 can be considered a good performance. Errors are often not distributed uniformly in your underlying dataset. You may get an 89% model accuracy score but discover that there are different regions of your data for which the model is failing 42% of the time. The consequence of these failure patterns with certain data groups can lead to fairness or reliability issues. It is essential to understand areas where the model is performing well or not. The data regions where there are a high number of inaccuracies in your model may turn out to be an important data demographic.
![Analyze and debug model errors](./images/ea-error-distribution.png)
The Error Analysis component on the RAI dashboard illustrates how model failure is distributed across various cohorts with a tree visualization. This is useful in identifying features or areas where there is a high error rate with your dataset. By seeing where most of the models inaccuracies are coming from, you can start investigating the root cause. You can also create cohorts of data to perform analysis on. These data cohorts help in the debugging process to determine why the model performance is good in one cohort, but erroneous in another.
![Error Analysis](./images/ea-error-cohort.png)
The visual indicators on the tree map help in locating the problem areas quicker. For instance, the darker shade of red color a tree node has, the higher the error rate.
Heat map is another visualization functionality that users can use in investigating the error rate using one or two features to find a contributor to the model errors across an entire dataset or cohorts.
![Error Analysis Heatmap](./images/ea-heatmap.png)
Use error analysis when you need to:
* Gain a deep understanding of how model failures are distributed across a dataset and across several input and feature dimensions.
* Break down the aggregate performance metrics to automatically discover erroneous cohorts to inform your targeted mitigation steps.
## Model Overview
Evaluating the performance of a machine learning model requires getting a holistic understanding of its behavior. This can be achieved by reviewing more than one metric such as error rate, accuracy, recall, precision, or MAE (Mean Absolute Err) to find disparities among performance metrics. One performance metric may look great, but inaccuracies can be exposed in another metric. In addition, comparing the metrics for disparities across the entire dataset or cohorts helps shed light on where the model is performing well or not. This is especially important in seeing the models performance among sensitive vs insensitive features (e.g., patient race, gender, or age) to uncover potential unfairness the model may have. For example, discovering that the model is more erroneous in a cohort that has sensitive features can reveal potential unfairness the model may have.
The Model Overview component of the RAI dashboard helps not just in analyzing the performance metrics of the data representation in a cohort, but it gives users the ability to compare the models behavior across different cohorts.
![Dataset cohorts - model overview in RAI dashboard](./images/model-overview-dataset-cohorts.png)
The component's feature-based analysis functionality allows users to narrow down data subgroups within a particular feature to identify anomalies on a granular level. For example, the dashboard has built-in intelligence to automatically generate cohorts for a user-selected feature (eg., *"time_in_hospital < 3"* or *"time_in_hospital >= 7"*). This enables a user to isolate a particular feature from a larger data group to see if it is a key influencer of the model's erroneous outcomes.
![Feature cohorts - model overview in RAI dashboard](./images/model-overview-feature-cohorts.png)
The Model Overview component supports two classes of disparity metrics:
**Disparity in model performance**: These sets of metrics calculate the disparity (difference) in the values of the selected performance metric across subgroups of data. Here are a few examples:
* Disparity in accuracy rate
* Disparity in error rate
* Disparity in precision
* Disparity in recall
* Disparity in mean absolute error (MAE)
**Disparity in selection rate**: This metric contains the difference in selection rate (favorable prediction) among subgroups. An example of this is the disparity in loan approval rates. Selection rate means the fraction of data points in each class classified as 1 (in binary classification) or distribution of prediction values (in regression).
## Data Analysis
> "If you torture the data long enough, it will confess to anything" - Ronald Coase
This statement sounds extreme, but it is true that data can be manipulated to support any conclusion. Such manipulation can sometimes happen unintentionally. As humans, we all have bias, and it is often difficult to consciously know when you are introducing bias in data. Guaranteeing fairness in AI and machine learning remains a complex challenge.
Data is a huge blind spot for traditional model performance metrics. You may have high accuracy scores, but this does not always reflect the underlining data bias that could be in your dataset. For example, if a dataset of employees has 27% of women in executive positions in a company and 73% of men at the same level, a job advertising AI model trained on this data may target mostly a male audience for senior level job positions. Having this imbalance in data skewed the models prediction to favor one gender. This reveals a fairness issue where there is a gender bias in the AI model.
The Data Analysis component on the RAI dashboard helps to identify areas where theres an over- and under-representation in the dataset. It helps users diagnose the root cause of errors and fairness issues introduced from data imbalances or lack of representation of a particular data group. This gives users the ability to visualize datasets based on predicted and actual outcomes, error groups, and specific features. Sometimes discovering an underrepresented data group can also uncover that the model is not learning well, hence the high inaccuracies. Having a model that has data bias is not just a fairness issue but shows that the model is not inclusive or reliable.
![Data Analysis component on RAI Dashboard](./images/dataanalysis-cover.png)
Use data analysis when you need to:
* Explore your dataset statistics by selecting different filters to slice your data into different dimensions (also known as cohorts).
* Understand the distribution of your dataset across different cohorts and feature groups.
* Determine whether your findings related to fairness, error analysis, and causality (derived from other dashboard components) are a result of your dataset's distribution.
* Decide in which areas to collect more data to mitigate errors that come from representation issues, label noise, feature noise, label bias, and similar factors.
## Model Interpretability
Machine learning models tend to be black boxes. Understanding which key data features drive a models prediction can be challenging. It is important to provide transparency as to why a model makes a certain prediction. For example, if an AI system predicts that a diabetic patient is at risk of being readmitted back to a hospital in less than 30 days, it should be able to provide supporting data that led to its prediction. Having supporting data indicators brings transparency to help clinicians or hospitals to be able to make well-informed decisions. In addition, being able to explain why a model made a prediction for an individual patient enables accountability with health regulations. When you are using machine learning models in ways that affect peoples lives, it is crucial to understand and explain what influences the behavior of a model. Model explainability and interpretability helps answer questions in scenarios such as:
* Model debugging: Why did my model make this mistake? How can I improve my model?
* Human-AI collaboration: How can I understand and trust the models decisions?
* Regulatory compliance: Does my model satisfy legal requirements?
The Feature Importance component of the RAI dashboard helps you to debug and get a comprehensive understanding of how a model makes predictions. It is also a useful tool for machine learning professionals and decision-makers to explain and show evidence of features influencing a model's behavior for regulatory compliance. Next, users can explore both global and local explanations validate which features drive a models prediction. Global explanations lists the top features that affected a models overall prediction. Local explanations display which features led to a models prediction for an individual case. The ability to evaluate local explanations is also helpful in debugging or auditing a specific case to better understand and interpret why a model made an accurate or inaccurate prediction.
![Feature Importance component of the RAI dashboard](./images/9-feature-importance.png)
* Global explanations: For example, what features affect the overall behavior of a diabetes hospital readmission model?
* Local explanations: For example, why was a diabetic patient over 60 years old with prior hospitalizations predicted to be readmitted or not readmitted within 30 days back to a hospital?
In the debugging process of examining a models performance across different cohorts, Feature Importance shows what level of impact a feature has across the cohorts. It helps reveal anomalies when comparing the level of influence the feature has in driving a models erroneous predictions. The Feature Importance component can show which values in a feature positively or negatively influenced the models outcome. For instance, if a model made an inaccurate prediction, the component gives you the ability to drill down and pinpoint what features or feature values drove the prediction. This level of detail helps not just in debugging but provides transparency and accountability in auditing situations. Finally, the component can help you to identify fairness issues. To illustrate, if a sensitive feature such as ethnicity or gender is highly influential in driving a models prediction, this could be a sign of race or gender bias in the model.
![Feature importance](./images/9-features-influence.png)
Use interpretability when you need to:
* Determine how trustworthy your AI systems predictions are by understanding what features are most important for the predictions.
* Approach the debugging of your model by understanding it first and identifying whether the model is using healthy features or merely false correlations.
* Uncover potential sources of unfairness by understanding whether the model is basing predictions on sensitive features or on features that are highly correlated with them.
* Build user trust in your models decisions by generating local explanations to illustrate their outcomes.
* Complete a regulatory audit of an AI system to validate models and monitor the impact of model decisions on humans.
## Conclusion
All the RAI dashboard components are practical tools to help you build machine learning models that are less harmful and more trustworthy to society. It improves the prevention of treats to human rights; discriminating or excluding certain groups to life opportunities; and the risk of physical or psychological injury. It also helps to build trust in your models decisions by generating local explanations to illustrate their outcomes. Some of the potential harms can be classified as:
- **Allocation**, if a gender or ethnicity for example is favored over another.
- **Quality of service**. If you train the data for one specific scenario but the reality is much more complex, it leads to a poor performing service.
- **Stereotyping**. Associating a given group with pre-assigned attributes.
- **Denigration**. To unfairly criticize and label something or someone.
- **Over- or under- representation**. The idea is that a certain group is not seen in a certain profession, and any service or function that keeps promoting that is contributing to harm.
### Azure RAI dashboard
[Azure RAI dashboard](https://learn.microsoft.com/en-us/azure/machine-learning/concept-responsible-ai-dashboard?WT.mc_id=aiml-90525-ruyakubu) is built on open-source tools developed by the leading academic institutions and organizations including Microsoft are instrumental for data scientists and AI developers to better understand model behavior, discover and mitigate undesirable issues from AI models.
- Learn how to use the different components by checking out the RAI dashboard [docs.](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-responsible-ai-dashboard?WT.mc_id=aiml-90525-ruyakubu)
- Check out some RAI dashboard [sample notebooks](https://github.com/Azure/RAI-vNext-Preview/tree/main/examples/notebooks) for debugging more responsible AI scenarios in Azure Machine Learning.
---
## 🚀 Challenge
To prevent statistical or data biases from being introduced in the first place, we should:
- have a diversity of backgrounds and perspectives among the people working on systems
- invest in datasets that reflect the diversity of our society
- develop better methods for detecting and correcting bias when it occurs
Think about real-life scenarios where unfairness is evident in model-building and usage. What else should we consider?
## [Post-lecture quiz](https://gray-sand-07a10f403.1.azurestaticapps.net/quiz/6/)
## Review & Self Study
In this lesson, you have learned some of the practical tools of incorporating responsible AI in machine learning.
Watch this workshop to dive deeper into the topics:
- Responsible AI Dashboard: One-stop shop for operationalizing RAI in practice by Besmira Nushi and Mehrnoosh Sameki
[![Responsible AI Dashboard: One-stop shop for operationalizing RAI in practice](https://img.youtube.com/vi/f1oaDNl3djg/0.jpg)](https://www.youtube.com/watch?v=f1oaDNl3djg "Responsible AI Dashboard: One-stop shop for operationalizing RAI in practice")
> 🎥 Click the image above for a video: Responsible AI Dashboard: One-stop shop for operationalizing RAI in practice by Besmira Nushi and Mehrnoosh Sameki
Reference the following materials to learn more about responsible AI and how to build more trustworthy models:
- Microsofts RAI dashboard tools for debugging ML models: [Responsible AI tools resources](https://aka.ms/rai-dashboard)
- Explore the Responsible AI toolkit: [Github](https://github.com/microsoft/responsible-ai-toolbox)
- Microsofts RAI resource center: [Responsible AI Resources Microsoft AI](https://www.microsoft.com/ai/responsible-ai-resources?activetab=pivot1%3aprimaryr4)
- Microsofts FATE research group: [FATE: Fairness, Accountability, Transparency, and Ethics in AI - Microsoft Research](https://www.microsoft.com/research/theme/fate/)
## Assignment
[Explore RAI dashboard](assignment.md)

@ -0,0 +1,11 @@
# Explore Responsible AI (RAI) dashboard
## Instructions
In this lesson you learned about the RAI dashboard, a suite of components built on "open-source" tools to help data scientists perform error analysis, data exploration, fairness assessment, model interpretability, counterfact/what-if assesments and causal analysis on AI systems." For this assignment, explore some of RAI dashboard's sample [notebooks](https://github.com/Azure/RAI-vNext-Preview/tree/main/examples/notebooks) and report your findings in a paper or presentation.
## Rubric
| Criteria | Exemplary | Adequate | Needs Improvement |
| -------- | --------- | -------- | ----------------- |
| | A paper or powerpoint presentation is presented discussing RAI dashboard's components, the notebook that was run, and the conclusions drawn from running it | A paper is presented without conclusions | No paper is presented |

Binary file not shown.

After

Width:  |  Height:  |  Size: 395 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 154 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.2 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 54 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 289 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 125 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 168 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 100 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 33 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 122 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 259 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 23 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 23 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 305 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 108 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 229 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.7 MiB

@ -9,6 +9,10 @@ In this section of the curriculum, you will be introduced to some real-world app
## Lesson
1. [Real-World Applications for ML](1-Applications/README.md)
2. [Model Debugging in Machine Learning using Responsible AI dashboard components](2-Debugging-ML-Models/README.md)
## Credits
"Real-World Applications" was written by a team of folks, including [Jen Looper](https://twitter.com/jenlooper) and [Ornella Altunyan](https://twitter.com/ornelladotcom).
"Real-World Applications" was written by a team of folks, including [Jen Looper](https://twitter.com/jenlooper) and [Ornella Altunyan](https://twitter.com/ornelladotcom).
"Model Debugging in Machine Learning using Responsible AI dashboard components" was written by [Ruth Yakubu](https://twitter.com/ruthieyakubu)

@ -16,7 +16,7 @@ Azure Cloud Advocates at Microsoft are pleased to offer a 12-week, 26-lesson cur
Travel with us around the world as we apply these classic techniques to data from many areas of the world. Each lesson includes pre- and post-lesson quizzes, written instructions to complete the lesson, a solution, an assignment, and more. Our project-based pedagogy allows you to learn while building, a proven way for new skills to 'stick'.
**✍️ Hearty thanks to our authors** Jen Looper, Stephen Howell, Francesca Lazzeri, Tomomi Imura, Cassie Breviu, Dmitry Soshnikov, Chris Noring, Anirban Mukherjee, Ornella Altunyan, and Amy Boyd
**✍️ Hearty thanks to our authors** Jen Looper, Stephen Howell, Francesca Lazzeri, Tomomi Imura, Cassie Breviu, Dmitry Soshnikov, Chris Noring, Anirban Mukherjee, Ornella Altunyan, Ruth Yakubu and Amy Boyd
**🎨 Thanks as well to our illustrators** Tomomi Imura, Dasani Madipalli, and Jen Looper
@ -116,6 +116,7 @@ By ensuring that the content aligns with projects, the process is made more enga
| 24 | Introduction to reinforcement learning | [Reinforcement learning](8-Reinforcement/README.md) | Introduction to reinforcement learning with Q-Learning | [Python](8-Reinforcement/1-QLearning/README.md) | Dmitry |
| 25 | Help Peter avoid the wolf! 🐺 | [Reinforcement learning](8-Reinforcement/README.md) | Reinforcement learning Gym | [Python](8-Reinforcement/2-Gym/README.md) | Dmitry |
| Postscript | Real-World ML scenarios and applications | [ML in the Wild](9-Real-World/README.md) | Interesting and revealing real-world applications of classical ML | [Lesson](9-Real-World/1-Applications/README.md) | Team |
| Postscript | Model Debugging in ML using RAI dashboard | [ML in the Wild](9-Real-World/README.md) | Model Debugging in Machine Learning using Responsible AI dashboard components | [Lesson](9-Real-World/2-Debugging-ML-Models/README.md) | Ruth Yakubu |
## Offline access

Loading…
Cancel
Save