pull/34/head
Jen Looper 4 years ago
parent 0b75514e98
commit f9894e0340

@ -49,38 +49,37 @@ The main fairness-related harms can be classified as:
- Over- or under- representation
Lets take a look at the examples.
## Example of Allocation
### Allocation
Consider a given system for screening loan applications. The system tends to pick white men as better candidates over other groups. As a result, loans are withheld from certain applicants.
Consider a hypothetical system for screening loan applications. The system tends to pick white men as better candidates over other groups. As a result, loans are withheld from certain applicants.
Another example would be an experimental hiring tool developed by a large corporation to screen candidates. The tool systemically discriminated against one gender by using the models were trained to prefer words associated with another. It resulted in penalizing candidates whose resumes contain words such as "womens rugby team".
✅ Do a little research to find a real-world example of something like this
## Quality of service
### Quality of Service
Researchers found that several commercial gender classifiers had higher error rates that images of women with darker skin tones than the images of men with lighter skin tones.
Researchers found that several commercial gender classifiers had higher error rates around images of women with darker skin tones as opposed to images of men with lighter skin tones. [Reference](https://www.media.mit.edu/publications/gender-shades-intersectional-accuracy-disparities-in-commercial-gender-classification/)
## Stereotyping
Another infamous example is a hand soap dispenser that could not seem to be able to sense people with dark skin. [Reference](https://gizmodo.com/why-cant-this-soap-dispenser-identify-dark-skin-1797931773)
### Stereotyping
Stereotypical gender view was found in machine translation. When translating “he is a nurse and she is a doctor” into Turkish, a genderless language, which has one pronoun, “o” to convey a singular third person, then back into English yields the stereotypical and incorrect as “she is a nurse and he is a doctor”.
![](https://paper-attachments.dropbox.com/s_58C539AF5AA3BE3D65779BD174F3C34E6CEFBF052C47009C14C57C4315FC76A3_1622570536596_file.png)
Stereotypical gender view was found in machine translation. When translating “he is a nurse and she is a doctor” into Turkish, problems were encountered. Turkish is a genderless language which has one pronoun, “o” to convey a singular third person, but translating the sentence back from Turkish to English yields the stereotypical and incorrect as “she is a nurse and he is a doctor”.
![](https://paper-attachments.dropbox.com/s_58C539AF5AA3BE3D65779BD174F3C34E6CEFBF052C47009C14C57C4315FC76A3_1622570536603_file.png)
![translation to Turkish](images/english-to-turkish.png)
## Denigration
![translation back to English](images/turkish-to-english.png)
### Denigration
An image labeling technology infamously mislabeled images of dark-skinned people as gorillas. Mislabeling is harmful not just because the system made a mistake because it specifically applied a label that has a long history of being purposefully used to denigrate demean Black people.
## Over- or under- representation
An image labeling technology infamously mislabeled images of dark-skinned people as gorillas. Mislabeling is harmful not just because the system made a mistake because it specifically applied a label that has a long history of being purposefully used to denigrate Black people.
[![AI: Ain't I a Woman?](https://img.youtube.com/vi/QxuyfWoVV98/0.jpg)](https://www.youtube.com/watch?v=QxuyfWoVV98 "AI, Ain't I a Woman?")
> Video: AI, Ain't I a Woman - a performance showing the harm caused by racist denigration by AI
### Over- or under- representation
Skewed image search results can be a good example of this harm. When searching images of professions with an equal or higher percentage of men than women, such as engineering, or CEO results heavily skewed toward images of men than reality.
Skewed image search results can be a good example of this harm. When searching images of professions with an equal or higher percentage of men than women, such as engineering, or CEO, watch for results that are more heavily skewed towards a given gender.
There five main types of harms are not mutually exclusive, and a single system can exhibit more than one type of harms.
Each case varies severities, for instance, unfairly labeling someone as a criminal is a much more severe harm than mislabeling an image but it's important to remember that even relatively non severe harms can make people feel alienated or singled out and the cumulative impact can be extremely oppressive.
These five main types of harms are not mutually exclusive, and a single system can exhibit more than one type of harm. In addition, each case varies in its severity. For instance, unfairly labeling someone as a criminal is a much more severe harm than mislabeling an image. It's important, however, to remember that even relatively non-severe harms can make people feel alienated or singled out and the cumulative impact can be extremely oppressive.
✅ Discussion: Revisit some of the examples and see if they show different harms.
@ -93,97 +92,103 @@ Each case varies severities, for instance, unfairly labeling someone as a crimin
## Detecting unfairness
There are many reasons why the system behaves unfairly— the reasons include societal biases reflected of the datasets used to train them. For example, the hiring unfairness was caused by the historical data, by using the patterns in resumes submitted to the company over a 10-year period, and the problem was that the majority came from men, a reflection of male dominance across the tech industry.
There are many reasons why a given system behaves unfairly. Social biases, for example, might be reflected in the datasets used to train them. For example, hiring unfairness might have been exacerbated by overrreliance on historical data. By using the patterns in resumes submitted to the company over a 10-year period, the model determined that men were more qualified because the majority of resumes came from men, a reflection of past male dominance across the tech industry.
Inadequate data points about a certain group of people can be the reason. For example, image classifiers have higher rate error for images of dark-skinned women because there arent enough dataset with darker skin tones to train.
Inadequate data about a certain group of people can be the reason for unfairness. For example, image classifiers a have higher rate of error for images of dark-skinned people because darker skin tones were underrepresented in the data.
Wrong assumptions made during the development cases unfairness too. For example, facial analysis system to predict who is going to commit a crime based on images of peoples faces. The assumption to make it believe this system is capable of doing this could lead substantial harms for people who are misclassified.
## Understand your models and build fairness
Wrong assumptions made during development causes unfairness too. For example, a facial analysis system intended to predict who is going to commit a crime based on images of peoples faces can lead to damaging assumptions. This could lead to substantial harms for people who are misclassified.
## Understand your models and build in fairness
Although many aspects of fairness are not captured in quantitative fairness metrics, and it is not possible to fully remove bias from a system to guarantee fairness, you are still responsible to detect and to mitigate fairness issues as much as possible.
When you are working with machine learning models, it is important to understand your models with interpretability and assess and mitigate unfairness.
Lets use the loan selection example to isolate the case to figure out each factors level of impact on the prediction.
When you are working with machine learning models, it is important to understand your models by means of assuring their interpretability and by assessing and mitigating unfairness.
Lets use the loan selection example to isolate the case to figure out each factor's level of impact on the prediction.
## Assessment methods
1. Identify the harms (and benefits)
1. Identify harms (and benefits)
2. Identify the affected groups
3. Define fairness metrics
### Identify the harms (and benefits)
What are the harms and benefits associated with lending? Think false negatives and false positive scenarios:
**False negatives** (reject, but Y=1) - when an applicant will be capable of repaying loan is rejected. This is an adverse event because the resources of the loans are withheld from qualified applicants.
**False positives** (accept, but Y=0) - when the applicant does get the loan but eventually defaults. As the result, the applicant will be sent to the debt collection agencies, and possibly affects their future loan applications.
Identify the affected groups
### Identify harms (and benefits)
The next step is to determine which groups are likely to be affected. For example, in case of a credit card application, where you see women are receiving much lower credit limits compared with their spouses who shares assets, the affected groups can be defined by the gender identity.
What are the harms and benefits associated with lending? Think about false negatives and false positive scenarios:
**False negatives** (reject, but Y=1) - in this case, an applicant who will be capable of repaying a loan is rejected. This is an adverse event because the resources of the loans are withheld from qualified applicants.
**False positives** (accept, but Y=0) - in this case, the applicant does get a loan but eventually defaults. As a result, the applicant's case will be sent to a debt collection agency which can affect their future loan applications.
### Identify affected groups
The next step is to determine which groups are likely to be affected. For example, in case of a credit card application, a model might determine that women should receive much lower credit limits compared with their spouses who share household assets. An entire demographic, defined by gender, is thereby affected.
### Define fairness metrics
You have identified harms and affected groups, in this case, gender. Now, use the quantified factors to disaggregate metrics. For example, when you have the data below, by examining this table, we see the women has the largest false positive rate and men has the smallest, and the opposite for false negatives.
You have identified harms and an affected group, in this case, delineated by gender. Now, use the quantified factors to disaggregate their metrics. For example, using the data below, you can see that women have the largest false positive rate and men have the smallest, and that the opposite is true for false negatives.
✅ In a future lesson on Clustering, you will see how to build this 'confusion matrix' in code
| | False positive rate | False negative rate | count |
| ---------- | ------------------- | ------------------- | ----- |
| women | 0.37 | 0.27 | 54032 |
| men | 0.31 | 0.35 | 28620 |
| Women | 0.37 | 0.27 | 54032 |
| Men | 0.31 | 0.35 | 28620 |
| Non-binary | 0.33 | 0.31 | 1266 |
Also, note that this table also tells us that the non-binary people have much smaller count. It means the data is less certain and possibly has a larger error bars, so you need to be more careful how you interpret these numbers.
This table tells us several things. First, we note that there are comparatively few non-binary people in the data. The data is skewed, so you need to be careful how you interpret these numbers.
So, in this case, we have 3 groups and 2 metrics. When we are thinking about how our system affects the group of customers/loan applicants, this may be sufficient, but when you want to define larger number of groups, you may want to distill this to smaller sets of summaries. To do that, you can add more metrics, such as largest difference, and smallest ratio of each false negative/positive rates.
In this case, we have 3 groups and 2 metrics. When we are thinking about how our system affects the group of customers with their loan applicants, this may be sufficient, but when you want to define larger number of groups, you may want to distill this to smaller sets of summaries. To do that, you can add more metrics, such as the largest difference or smallest ratio of each false negative and false positive.
Discussion: What other groups are likely to be affected for loan application?
Stop and Think: What other groups are likely to be affected for loan application?
## Mitigating unfairness
To mitigate the fairness issue, explorer the model to generate various mitigated models and compare them navigate tradeoffs between accuracy and fairness to select the model with the desired trade off.
To mitigate unfairness, explore the model to generate various mitigated models and compare the tradeoffs it makes between accuracy and fairness to select the most fair model.
This intro lesson does not dive deeply into the details of algorithmic unfairness mitigation, such as post-processing and reductions approach, but introducing a tool that you may want to try.
This introductory lesson does not dive deeply into the details of algorithmic unfairness mitigation, such as post-processing and reductions approach, but here is a tool that you may want to try.
## Fairlearn
\[Fairlearn\](https://fairlearn.github.io/) is an open-source Python package that allows you to assess your systems' fairness and mitigate unfairness.
The tool may help you to assesses how a model's predictions affect different groups, enables comparing multiple models by using fairness and performance metrics, and supply a set of algorithms to mitigate unfairness in binary classification and regression.
[Fairlearn](https://fairlearn.github.io/) is an open-source Python package that allows you to assess your systems' fairness and mitigate unfairness.
The tool helps you to assesses how a model's predictions affect different groups, enabling you to compare multiple models by using fairness and performance metrics, and supplying a set of algorithms to mitigate unfairness in binary classification and regression.
- Learn how to use the different components by checking out the Fairlearn's [GitHub](https://github.com/fairlearn/fairlearn/), [user guide](https://fairlearn.github.io/main/user_guide/index.html), [examples](https://fairlearn.github.io/main/auto_examples/index.html), and [sample notebooks](https://github.com/fairlearn/fairlearn/tree/master/notebooks).
- Learn how to use the different components by checking out the Fairlearn's [GitHub](https://github.com/fairlearn/fairlearn/)
- Explore the [user guide](https://fairlearn.github.io/main/user_guide/index.html), [examples](https://fairlearn.github.io/main/auto_examples/index.html)
- Try some [sample notebooks](https://github.com/fairlearn/fairlearn/tree/master/notebooks).
- Learn [how to](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-machine-learning-fairness-aml) enable fairness assessment of machine learning models in Azure Machine Learning.
- Learn [how to enable fairness assessments](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-machine-learning-fairness-aml?WT.mc_id=academic-15963-cxa
- ) of machine learning models in Azure Machine Learning.
- See the [sample notebooks](https://github.com/Azure/MachineLearningNotebooks/tree/master/contrib/fairness) for more fairness assessment scenarios in Azure Machine Learning.
- Check out these [sample notebooks](https://github.com/Azure/MachineLearningNotebooks/tree/master/contrib/fairness) for more fairness assessment scenarios in Azure Machine Learning.
## 🚀 Challenge
To avoid biases to be introduced in the first place, we should:
- have a diversity of backgrounds and perspectives among the people working on systems
- invest in datasets that reflect the diversity in our society
- invest in datasets that reflect the diversity of our society
- develop better methods for detecting and correcting bias when it occurs
What else should we consider?
Think about real-life scenarios where unfairness is evident in model-building and usage. What else should we consider?
## [Post-lecture quiz](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/6/)
## Review & Self Study
In this lesson, you have learned about fairness/unfairness in ML.
In this lesson, you have learned some basics of the concepts of fairness and unfairness in machine learning.
Watch this workshop to dive deeper into the topics:
YouTube: Fairness-related harms in AI systems: Examples, assessment, and mitigation by Hanna Wallach and Miro Dudik [Fairness-related harms in AI systems: Examples, assessment, and mitigation - YouTube](https://www.youtube.com/watch?v=1RptHwfkx_k)
- YouTube: Fairness-related harms in AI systems: Examples, assessment, and mitigation by Hanna Wallach and Miro Dudik [Fairness-related harms in AI systems: Examples, assessment, and mitigation - YouTube](https://www.youtube.com/watch?v=1RptHwfkx_k)
Also, read:
■ Microsofts RAI resource center: [Responsible AI Resources Microsoft AI](https://www.microsoft.com/en-us/ai/responsible-ai-resources?activetab=pivot1%3aprimaryr4)
■ Microsofts FATE research group: [FATE: Fairness, Accountability, Transparency, and Ethics in AI - Microsoft Research](https://www.microsoft.com/en-us/research/theme/fate/)
- Microsofts RAI resource center: [Responsible AI Resources Microsoft AI](https://www.microsoft.com/en-us/ai/responsible-ai-resources?activetab=pivot1%3aprimaryr4)
- Microsofts FATE research group: [FATE: Fairness, Accountability, Transparency, and Ethics in AI - Microsoft Research](https://www.microsoft.com/en-us/research/theme/fate/)
■ Fairlearn toolkit: [Fairlearn](https://fairlearn.org/)
Explore the Fairlearn toolkit
■ Azure Machine Learning - **Machine learning fairness**
[Fairlearn](https://fairlearn.org/)
https://docs.microsoft.com/en-us/azure/machine-learning/concept-fairness-ml
Read about Azure Machine Learning's tools to ensure fairness
- [Azure Machine Learning](https://docs.microsoft.com/en-us/azure/machine-learning/concept-fairness-ml?WT.mc_id=academic-15963-cxa)
## [Assignment Name](assignment.md)

Binary file not shown.

After

Width:  |  Height:  |  Size: 56 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 56 KiB

Loading…
Cancel
Save