004 Fine Reading of One of the Best Long Papers at Emnlp 2017

004 Fine Reading of One of the Best Long Papers at EMNLP 2017 #

EMNLP (Conference on Empirical Methods in Natural Language Processing) is a highly influential and large-scale international conference in the field of natural language processing. It is hosted annually by SIGDAT (Special Interest Group on Linguistic Data and Corpus-based Approaches to NLP), a specialized committee of ACL (Association for Computational Linguistics). The conference, which has been held since 1996, has a history of more than 20 years. The 2017 EMNLP conference took place in Copenhagen, Denmark from September 7th to 11th.

Every year, the conference selects two most valuable papers from numerous academic papers as the Best Long Paper Award. Today, I will take you through a thorough analysis of this year’s Best Long Paper at EMNLP, titled “Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints”. This paper is quite timely, as the academic community has recently expressed concerns about the “bias” that may be introduced by data and machine learning algorithms. Many scholars are studying how to evaluate and detect these biases in order to improve or even eliminate them.

Author Information #

The first author, Jieyu Zhao, received her Ph.D. in Computer Science from the University of Virginia at the time of publication. Currently, she has transferred to the University of California, Los Angeles, where she is conducting research on how to detect and mitigate bias in machine learning algorithms. She previously obtained her bachelor’s and master’s degrees from Beihang University in Beijing, and interned at Didi Research Institute in 2016.

The second author, Tianlu Wang, is also a Ph.D. student in the Computer Science department at the University of Virginia. He previously obtained his bachelor’s degree in Computer Science from Zhejiang University.

The third author, Mark Yatskar, is a Ph.D. student in the Computer Science department at the University of Washington. He has published several high-quality papers in the field of natural language processing and image processing.

The fourth author, Vicente Ordóñez, is currently an Assistant Professor in the Computer Science department at the University of Virginia. His research focuses on the intersection of natural language processing and computer vision. He obtained his Ph.D. from the University of North Carolina at Chapel Hill in 2015 and has interned at Microsoft Research, eBay Research, and Google during his doctoral studies. He is the doctoral advisor of the second author, Tianlu Wang.

The last author, Kai-Wei Chang, is also the mentor of the first author, Jieyu Zhao. He is currently an Assistant Professor at the University of California, Los Angeles, and previously worked at the University of Virginia. He received his Ph.D. from the University of Illinois at Urbana-Champaign in 2015 under the supervision of the renowned professor Dan Roth. During his graduate career, he interned three times at Microsoft Research and also interned at Google Research. In the early stages of his research, he was involved in the development of the well-known support vector machine software, LibLinear.

Main Contributions of the Paper #

An important task in machine learning is to learn specific matters through data. Recently, researchers in machine learning have discovered that data may contain some biases imposed by society, and machine learning algorithms are likely to amplify these biases. This situation may be more evident in tasks related to natural language processing. For example, in some datasets, the ratio of the word “cooking” appearing together with the word “female” may be 30% higher than its occurrence with the word “male”. After training with machine learning algorithms on this dataset, this ratio may increase to 68% on the test dataset. Therefore, although social biases are already present in the dataset, they are amplified by machine learning algorithms.

Therefore, the core idea of this paper is how to design algorithms that can eliminate this amplification of biases, making machine learning algorithms more “fair”. Note that we are talking about eliminating the amplified biases, not pursuing absolute balance. For example, in the dataset mentioned earlier, the training set already shows a higher frequency of “female” and “cooking” appearing together compared to “male” and “cooking”. The algorithm needs to ensure that this frequency does not further increase in the test set, in other words, maintaining the previous 30% difference without widening it. This paper is not aiming to artificially adjust this difference to an equal state.

The paper proposes a Constrained Optimization algorithm to establish constraints for the test data, so that the results of the machine learning algorithm can achieve a bias ratio similar to that in the training set. Note that this is an adjustment (calibration) to the existing test results and can be applied to various different algorithms.

The authors conducted experiments using the proposed algorithm on two datasets and obtained results that not only significantly reduced biases (up to 30% to 40%), but also maintained the original test accuracy. It can be seen that the proposed algorithm has a significant effect.

Core Method of the Research Paper #

So, what method did the authors propose?

First, they introduced a concept called “Bias Score”. This value measures the proportion relationship between a certain variable and the target variable. For example, it measures the proportion relationship between the word “male” and a particular verb (e.g., “cooking”), as well as the proportion relationship between the word “female” and the same verb.

Note that because “male” and “female” are both options for “gender,” the sum of the proportion relationships for these two words with the same verb must be 1. The difference in bias scores between the training set and the testing set is used to evaluate whether bias amplification has occurred. In the previous example, the bias score for the combination of “female” and “cooking” is 0.66 in the training set, but it becomes 0.84 in the testing set, indicating that bias is amplified by the algorithm.

After introducing the concept of bias scores, the authors then define constraints for the results of the testing set. The basic idea here is to reselect the predicted labels for the testing set in order to make the predicted distribution of labels more similar to the expected distribution. Using the previous example, we want to bring the possibility of “female” appearing in the context of “cooking” back to around 0.66 from 0.84. This can be done because the algorithm needs to adjust the testing results directly.

Modeling all the constraints actually becomes a classic constrained optimization problem. Solving this problem requires optimizing the predicted values for the entire testing data, which is often difficult due to the size of the testing dataset. Therefore, the authors here adopted the Lagrangian relaxation method to simplify the original optimization problem.

In other words, after applying the Lagrangian relaxation method, the original constrained optimization problem becomes an unconstrained optimization problem, and the algorithm can be implemented as a dynamic updating process. For each test case, the algorithm obtains the current optimal label change plan, and then further updates the Lagrangian parameters. After iterating through the entire testing dataset once, the algorithm terminates.

Experimental Results of the Method #

The authors conducted experiments using two datasets: imSitu and MS-COCO. imSitu is a task of visual semantic role labeling, consisting of up to 120,000 images with their corresponding textual semantic information. For example, some images depict scenes of cooking, with male or female characters. The authors prepared 212 verbs for experimentation. MS-COCO is a multi-label image classification problem, aiming to predict labels for 80 different classes of objects.

For both tasks, the authors chose Conditional Random Field (CRF) as the underlying model. CRF is often the method of choice for solving these types of problems. As for features, the authors used various deep learning-based features provided by the datasets. Additionally, a bias correction algorithm proposed by the authors was applied to the test set on top of CRF.

It is worth noting that although the algorithm itself requires test data, it does not need to know the true labels of the test data. The label information is only obtained from the training set, which the authors repeatedly emphasize.

Based on the results from the two datasets, the performance remains good. The original prediction accuracy did not significantly decrease, while the gender bias value decreased significantly after adjusting the test set, with the maximum reduction being over 40%.

Summary #

Today I talked to you about the Best Long Paper of EMNLP 2017. This paper addresses the problem of social bias in datasets and the potential for machine learning algorithms to further amplify this bias. The authors propose an algorithm that adjusts the predicted results of a test dataset to reduce bias and make the bias level in the test dataset similar to that in the training dataset.

Let’s recap the key points: first, a brief introduction to the authors of the paper. Second, a detailed explanation of the problem the paper aims to solve and its contributions. Third, an introduction to the core content of the proposed method.

Finally, I leave you with a question to ponder: why can machine learning algorithms amplify existing biases in the training set? How does this relate to specific algorithms?

Further reading: Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints