019 Sigir 2018 Paper Review the Relationship Between Bias and Popularity

019 SIGIR 2018 Paper Review - The Relationship between Bias and Popularity #

The SIGIR (International Conference on Research and Development in Information Retrieval) for 2018 took place from July 8th to July 12th in Ann Arbor, Michigan, USA. Starting from today, I will select a few of the most valuable papers from the conference and read them together with you.

First, let me introduce this conference briefly. SIGIR has a 40-year history since its inception in 1978 and is regarded as a top conference in the field of information retrieval and search. The full name of SIGIR 2018 is “The 41st International ACM SIGIR Conference on Research and Development in Information Retrieval.”

Since its inception, this conference has become an authoritative academic conference in the field of information retrieval, particularly in the areas of search and recommendation technologies. The conference content often includes excellent papers in various fields such as search, recommendation, advertising, information extraction, internet data mining, etc. Every year, it attracts scholars and engineers from all over the world to share their latest research results.

Today, let’s start by taking a look at one of this year’s best papers titled “Should I Follow the Crowd? A Probabilistic Analysis of the Effectiveness of Popularity in Recommender Systems.”

This paper has two authors, both from Universidad Autónoma de Madrid. The first author, Rocio Cañamares, has published several related papers on this topic, while the second author, Pablo Castells, is an academic authority in the field of information retrieval, not only at Universidad Autónoma de Madrid but also in the whole of Europe. The paper has been cited over 5,000 times.

Main Contributions of the Paper #

To fully understand the main contributions of this paper, we first need to examine a core problem that exists in all information retrieval systems: “bias”. Bias brings about a series of issues, posing significant challenges to the modeling and evaluation of recommendation systems and information retrieval systems as a whole.

So why do information retrieval systems have bias?

Let me illustrate with a simple example. Let’s assume we have two items and many users. For each user, the system randomly displays these two items and asks the user if they like them.

In this hypothetical scenario, the order is random. Therefore, whether a user likes a certain item solely depends on the attributes of the item itself. In the case of all users, their overall preferences for these two items are derived purely from their individual evaluations of the items. Here, we can see that there is no bias present.

However, even a slight change in this scenario can easily introduce various biases. For instance, if we have over ten thousand items, and although we still display them randomly to users, users may gradually become tired after viewing a certain number of items. As a result, their judgment of item preferences might be influenced by this weariness, and they may even give up looking at the remaining items altogether.

There are many similar situations. For example, instead of individually presenting each item to users, we provide a list. In this case, users may assume that the list has a certain order, such as assuming that items ranked higher on the list are more important. Research shows that, in the presence of a list, users are likely to make preference judgments based on the order of the list. Clearly, in such a scenario, users’ preference judgments are influenced by the order of the list.

The biases mentioned above are known as “presentation bias”. In addition to this, information systems also exhibit various types of biases, such as “systemic bias”: a news system that only recommends entertainment news to users, excluding political news. In this case, the users’ manifested preferences are biased because the system does not allow them to express their potential preference for political news.

Scholars in the field of information retrieval and recommendation systems have long been aware of the impact of bias on modeling. Whether it is the presentation bias mentioned earlier or systemic bias, if we directly utilize data generated from user-system interactions, the resulting models and the evaluation methods we employ will also be biased, leading to potentially imprecise conclusions.

This paper aims to systematically discuss the problems introduced by bias in recommendation systems. Specifically, the paper focuses on exploring the relationship between bias and “popularity”.

The paper describes a situation in which certain items have likely been recommended to many people, or have been liked or evaluated by many people simultaneously. Will such popular items introduce unexpected biases in the evaluation of recommendation results?

In previous research, there was only an intuitive suspicion regarding highly popular items, assuming that if a recommendation system only recommends popular items, it must be biased. However, many previous studies did not quantitatively explain the relationship between bias and evaluation. This paper provides a theoretical framework to guide our understanding of bias and the changes it brings to evaluation metrics.

Core Methodology of the Paper #

Today, we won’t go into the details of the theoretical framework of this paper. I will focus on providing a general idea to help you understand the purpose of this paper.

In simple terms, the authors used several random variables to express the relationship between bias and popularity: whether a user rates an item, whether a user has a preference for an item, and whether a user watches an item. One detail, or technique, here is how to use probability language to express the relationship between these three variables clearly.

The authors actually made some simplified assumptions, such as assuming that the items in the test set have not appeared in the training set, and so on. This allows us to write an expected relationship of how users rate items in the test set, which includes whether users have a preference for all test items. With this layer of expected relationship, the authors then derived what the ideal ordering would be in the test set under optimal conditions. The theoretical discussion here does not have much practical significance, but it is the first time researchers have used a mathematical model to characterize the conclusion of an optimal ordering based on popularity in such detail on a test set.

Next, the authors also discussed the variation of this optimal ordering under two extreme situations. One situation is that the user’s past behavior is solely dependent on the attributes of the item and does not have any other biases. The other situation is that the user’s past behavior is independent of the attributes of the item, meaning it is only dependent on other biases.

Under the first extreme situation, the optimal ordering is actually the same as the observed optimal ordering, which is based on popularity. Under the second extreme situation, the optimal ordering is actually based on average rating.

Of course, you may argue that discussing these two extreme situations is not practically meaningful. However, the discussion of these two extreme situations actually proves that only in the absence of bias is ordering based on popularity the optimal choice on average. And obviously, real-world bias exists, so relying on popularity-based ordering, even on average, is not the optimal choice.

Then, the paper discusses the impact of whether a user watches a particular item on user behavior. There have been many previous works exploring this part. However, this paper comes to an interesting conclusion. After considering the bias of users watching items, through simulation methods, we find that the effect of random results is actually much better than previous observations, and although popularity-based ordering is good, it is not much better than random ordering, while results based on average ratings are actually better than popularity-based ordering. It can be said that this is a new finding that is different from much previous work.

Further Discussion #

Although this paper has won the Best Paper Award at SIGIR 2018, if we analyze it from a broader perspective, we will realize that the authors have actually developed a unique theoretical framework to describe a certain type of bias in recommendation systems. To more universally model bias, it is necessary to have randomized data and utilize causal inference to analyze bias in any given situation. The probability model proposed in the paper is only valid under the assumptions discussed in this study.

However, the flaws do not outweigh the merits. This paper provides us with a lot of meaningful content, both in terms of conclusions and practical analysis, helping us to contemplate the challenges that bias brings to modeling and how we should respond to them.

Summary #

Today I talked to you about the best paper of SIGIR 2018.

Let’s recap the key points: Firstly, we detailed the problem and contributions addressed in this article, discussing the relationship between bias and popularity, and systematically discussing the problems that bias brings to recommendation systems. Secondly, we briefly introduced the core content of the proposed method in the article, including setting random variables, expected relations, and deriving the optimal ranking under ideal conditions. Thirdly, we had a brief discussion about the paper.

Finally, I leave you with a question to ponder: Why do general recommendation systems prefer algorithms that recommend popular items, disregarding bias?