027 Icml 2018 Paper Review Optimization of Objective Functions Might Amplify Unfairness

027 ICML 2018 Paper Review - Optimization of Objective Functions Might Amplify Unfairness #

Today we will be discussing a paper nominated for the Best Paper award at ICML 2018 titled “Fairness Without Demographics in Repeated Loss Minimization”.

This paper explores the topic of how to optimize the objective function in a way that achieves equal accuracy for different subgroups, thereby avoiding excessive emphasis on the majority group during the optimization process. The authors of this paper all come from Stanford University.

Main Contributions of the Paper #

This paper aims to discuss the issue of “fairness” brought about by algorithms, but with a different perspective from our previous paper on fairness. The core idea of this paper is to discuss the relationship between machine learning and fairness through the principle of optimizing machine learning objective functions.

The authors found that machine learning algorithms based on “average loss” optimization often bring significant inaccuracies to certain minority groups. This is not a problem with the model itself, but with the objective function of the optimization. In such cases, the objective function primarily focuses on groups with more data, ensuring that the losses for these groups are minimized, while potentially ignoring the minority groups that are not numerically dominant.

Based on this, another issue of “retention” is introduced. Because minority groups endure larger optimization losses, these groups may leave or be excluded from the system. Therefore, over time, the number of minority groups may gradually decrease. This may be an unintended consequence of the average loss function designed by the objective function designers. The authors also refer to this phenomenon as “disparity amplification”.

An important contribution of this paper is the discovery that the Empirical Risk Minimization (ERM) actually exhibits this unfairness amplification. ERM includes the objective functions of many classical machine learning models, such as Support Vector Machines, Logistic Regression, and Linear Regression. The authors also found that ERM can cause initially fair models to gradually become unfair during the iterative process.

To address the issue of ERM, the authors developed a new algorithm framework called Distributionally Robust Optimization (DRO). This framework aims to minimize the risk of worst-case scenarios rather than average risk. The authors demonstrated in real-world data that DRO is more effective than ERM in addressing unfairness issues for minority groups.

Core Method of the Paper #

In order to solve the fairness issue among different groups under Empirical Risk Minimization (ERM), the authors first made a new assumption about the data.

The authors assumed that there are K latent groups in the data. Each data point has a certain probability of belonging to these K groups. Of course, we do not know the data distribution of these K groups themselves, nor do we know the probabilities of each data point belonging to these K groups. These are all latent variables that our model needs to estimate.

For each data point, under the current model, we can estimate an " expected loss “. In the new assumption framework, because each data point may belong to different K groups, and each group has a different data distribution, this will lead to different expected losses under different groups, resulting in K different expected losses. Our goal is to control the worst loss among these K losses, or the worst-case scenario. If we can make the worst loss smaller than a certain value, then the average value is definitely better than this scenario. This intuitively solves the problem of unfair amplification.

So, what would be the effect if we directly apply ERM in such a setting? Here, there is a numerical value that we are particularly concerned about, which is the expected number of people in each group under the whole framework assumption. This value is equal to the remaining number of people in the current group plus the newly added number of people under the expected loss condition. The authors establish a theoretical definition of this expected number of people in the paper.

The intuitive explanation of this conclusion is that if the estimation of the expected number of people can reach a stable state during the current updating process, then it is possible to stabilize at this point and the situation of unfair amplification will not occur. On the other hand, if it does not reach this stable state, then the situation of unfair amplification will definitely occur. In other words, under the optimization of ERM, the size of the groups may change, leading to the loss of individuals.

Based on this theoretical result, the authors propose Distributionally Robust Optimization (DRO). The core idea of DRO is to change the insufficient sampling of the current small group due to imbalanced data allocation during the optimization process.

Specifically, DRO assigns higher weights to the individuals with higher losses in the current group, which means it pays more attention to the areas where the current objective function performs poorly. For each data point, the probability of the group corresponding to the high-loss group will be amplified, highlighting the current loss status of this group. In other words, DRO gives priority to optimizing small groups with high losses under the current situation. With such a setting, DRO can achieve optimization for the worst-case scenario and avoid unfair amplification.

The authors demonstrate in the paper that the objective function corresponding to DRO can be optimized under the framework of recursive descent. In other words, any algorithm currently using ERM has the opportunity to change into the optimization process of DRO, thus avoiding the issue of unfair amplification.

Experimental Results of the Paper #

The authors conducted experiments on both a simulated and a real dataset. Here, we will briefly discuss the experiment carried out on the real data.

The authors investigated a task called “Auto Completion”. This task involves predicting the likelihood of the next word given the current word. The data used for this experiment was sourced from Twitter posts generated by two different groups of people: white Americans and black Americans. The authors aimed to simulate the retention rates and model losses of these two groups in the experiment. The underlying hypothesis was that the English vocabulary and expression patterns of white and black Americans might be different. If these two groups were mixed together during optimization, it is likely that the user experience of black users would not be taken into account, resulting in low retention rates among black users.

After the experiment, it was found that DRO (Distributionally Robust Optimization) outperformed ERM (Empirical Risk Minimization) in satisfying black users and achieving higher retention rates among them. This experiment confirmed that DRO indeed has the ability to cater to the needs of minority groups.

Summary #

Today I have explained to you the best paper nominations for this year’s ICML.

Let’s recap the key points: First, this paper also discusses the issue of “fairness” brought by algorithms, considering this problem from the perspective of optimizing the machine learning objective function. Second, an important contribution of this paper is the discovery of the amplification of unfairness in ERM, based on which the authors have developed a new algorithm framework called DRO. Third, the experimental results of the paper validate the idea of DRO, which indeed can solve the issue of unfairness towards minority groups.

Finally, I leave you with a thinking question. We have discussed the issue of algorithm fairness from different perspectives in these two sessions. Do you have your own perspective to consider this problem?