018 the Web 2018 Paper Review How to Extract Higher Order Relations From Text

018 The Web 2018 Paper Review - How to Extract Higher-Order Relations from Text #

Today, we will be looking at an excellent short paper presented at the World Wide Web conference. At the conference, there are primarily two types of papers being presented. The first type is long papers, around 10 pages in length, while the other type is short papers, also known as poster papers, which are around 2 pages long. Short papers are mainly used to present small achievements or significant findings still in the research process. Each edition of the World Wide Web conference selects one short paper for the Best Short Paper Award.

The paper I will be sharing with you today is titled “An Improved Sampler for Bayesian Personalized Ranking by Leveraging View Data” (Link to the paper). This paper also has six authors and, similar to the previous paper we discussed, they are all from Tsinghua University and the National University of Singapore.

Bayesian Personalized Ranking #

To understand the content of this paper, we need to explain what “Bayesian Personalized Ranking” (BPR) is. For a detailed introduction to BPR, please refer to Reference [1]. Here, we will provide a high-level summary of BPR.

In simple terms, BPR is a pairwise learning algorithm used in recommendation systems. In our previous discussion on search algorithms, we mentioned various pairwise learning algorithms. Pairwise learning does not focus on learning the label or response variable for each data instance, but rather learning a relative order, aiming to rank all positive instances before negative ones. In other words, for pairwise learning, the actual prediction value for each data instance is not important. The ranking algorithm is concerned with accurately placing positive instances above negative instances in a pair. This requires that BPR assigns a higher prediction value to positive instances than negative instances.

BPR mainly addresses the issue of only predicting individual data points in traditional recommendation systems. For example, when modeling the preference matrix for user items, most previous algorithms are not able to effectively model unobserved data. BPR, as a pairwise algorithm, only focuses on observed data and their relationships. Consequently, it can achieve more significant effects on user preferences, especially when dealing with “implicit feedback” data. Implicit feedback refers to user preferences expressed through interactions with the system, rather than explicitly telling the system their preference level for each item. These user behaviors are often incomplete, so algorithms and models need to effectively model these behaviors.

Main Contributions and Core Methods of the Paper #

After understanding what BPR is about, let’s take a look at the main contributions and core methods of this paper.

First, as we mentioned earlier, the core of BPR is to learn a pairwise ranking problem. During training, we need to learn and update parameters based on pairs of positive and negative examples. However, in a natural user implicit feedback dataset, positive examples are often relatively scarce, while negative examples are the majority. Therefore, a traditional approach is to “uniformly” sample negative examples to form pairs relative to a positive example. This process is sometimes called “sampling.”

This paper has two main contributions. The first contribution is that the authors found that it is unnecessary and may even affect the final learning effect to uniformly sample negative examples globally. The second contribution is that, for e-commerce applications, the authors invented a method of negative sample sampling, which allows the learning algorithm to utilize more user “view” information, thereby significantly improving the overall training effect of the algorithm.

Experimental Results of the Method #

This paper used the dataset from two platforms, “Be be net” and Tmall, in researching the effectiveness of the method. The “Be be net” dataset includes about 160,000 users, 120,000 products, 2.6 million purchases, and 46 million views. The Tmall dataset, on the other hand, comprises 30,000 users, over 30,000 products, 460,000 purchases, and over 1.5 million views. Both datasets exhibit a sparsity rate of over 99%.

First, the authors experimented by sampling only a small portion of negative samples instead of selecting them from the entire dataset. For example, they randomly sampled only a few hundred negative samples instead of tens of thousands. The experimental results showed that this approach not only did not affect the accuracy of the algorithm on the Be be net dataset but also improved its accuracy. However, on the Tmall dataset, the algorithm did not show improvement but rather a slight decrease. Nevertheless, the authors considered this trade-off worthwhile because reducing the dataset significantly reduced the training time of the algorithm. From this experiment, the authors concluded that there is no need to sample from the entire dataset.

Next, the authors introduced a new concept of dividing the user’s data into three sets: “purchased set” (C1), “viewed but not purchased set” (C2), and “remaining data set” (C3). The authors proposed that for BPR to achieve optimal results, it is necessary to sample from these three sets. In other words, we need to create pairs from C1 and C2, C1 and C3, and C2 and C3 for learning.

Specifically, different proportions were tried in sampling these three sets in the data from Be be net and Tmall. The general principle was that the sampled data from C3 should be greater than that from C2, and the sampled data from C2 should be greater than that from C1. This means that the training algorithm needs to better learn the user’s preferences towards items they dislike. By adopting this sampling method, the authors demonstrated that the model’s performance was about 60% better than traditional BPR or using only “most popular items” as recommendation results.

Summary #

Today I have told you about an excellent short paper from this year’s World Wide Web Conference. The article introduces how to improve a classic recommendation algorithm, BPR, to increase efficiency and significantly enhance algorithm effectiveness.

Let’s review the key points together: First, we explained the meaning of BPR from a high-dimensional perspective. Second, we briefly introduced the main contributions and ideas of the paper. Third, we shared the experimental results of the paper.

Finally, I will leave you with a question to ponder: Besides the pairing approach of positive and negative instances proposed in this paper, can you think of any other information that could help us create more pairs when users browse websites?

References

  1. Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. BPR: Bayesian personalized ranking from implicit feedback. Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence (UAI ‘09). AUAI Press, Arlington, Virginia, United States, 452-461, 2009.