015 Wsdm 2018 Paper Review How to Use Contextual Information in Deep Learning Models

015 WSDM 2018 Paper Review - How to Use Contextual Information in Deep Learning Models #

Today, we will continue to summarize a paper from WSDM 2018 titled “Latent Cross: Making Use of Context in Recurrent Recommender Systems.” This paper, also from the Google team, aims to simulate and implement the effects of widely used “cross features” in recommendation systems using deep models.

Author Information #

All the authors of this paper are from Google, and here we will give a brief introduction to the main authors.

The first author of the paper, Alex Beutel, is a Senior Scientist at Google and joined the company in 2016. Beutel graduated from Carnegie Mellon University with a Ph.D. in Computer Science under the guidance of Alex Smola, an authority in machine learning.

The last author, Ed H. Chi, is a Chief Scientist at Google. He holds 39 patents and has published more than 110 papers. Prior to joining Google, Chi was a Senior Researcher at the Palo Alto Research Center. Chi received his Ph.D. in Computer Science from the University of Minnesota.

Main Contributions of the Paper #

First, let’s look at the main contributions of this article and outline the problems it addresses in a particular scenario.

Recommendation systems often need to model current scenarios, sometimes referred to as “context.” In the past, many traditional methods have explored how to utilize context information for recommendations, such as using tensors for modeling or leveraging temporal characteristics to process context information.

In recent years, with the development of deep learning, more and more deep learning models have been applied in the field of recommendation systems. However, there has been no direct exploration of how to incorporate context into deep learning models. This article aims to make an attempt in this aspect.

One challenging problem arises here. In the past, such context was often represented using “cross-features,” which is the multiplication of two features to create a new feature. This method has been widely used in matrix factorization or tensor factorization models. However, in deep learning, the past practice was to not directly use such cross-features. But in recommendation systems where context is crucial, not utilizing cross-features often leads to suboptimal results.

This article introduces a concept called “latent cross,” which directly operates at the embedding layer, thereby simulating the effect of cross-features in the architecture of deep models.

Core Method of the Paper #

The authors first discuss a common feature in recommendation systems, which is the use of cross-features to achieve a " low-rank " representation, which is a basic assumption of matrix factorization. For example, each rating can be expressed as the dot product of a user vector and an item vector.

So, the authors proposed the following question: Can a Feedforward Neural Network, which is the foundation of deep learning, effectively simulate this structure?

Through simulation and small-scale experiments, the authors empirically verified that deep learning models do not capture the “low-rank” representation brought about by such cross-features very well. In fact, deep learning models need more layers and wider layers to achieve the same effect as the cross-features. This may come as somewhat of a surprise to us. The authors also made corresponding comparisons with traditional RNNs, which will not be restated here.

After obtaining such results, the authors proposed a function called “Implicit Cross.” This function is actually very intuitive. Traditional deep learning models concatenate multiple different pieces of information together as input. “Implicit Cross” multiplies the current input features with contextual information to directly model the “cross-features”.

The benefits of doing this are obvious. Previously, we relied on deep learning models themselves to learn such cross-relationships. Now, the authors directly let contextual information act on input information and other intermediate features, thereby enhancing the impact of contextual information.

The method proposed in this paper can be seen as the first attempt to apply traditional ideas from recommendation systems to the context of deep learning.

Experimental Results of the Methods #

This article conducted experiments using Google’s Youtube data. The authors compared a series of methods and concluded that the combination of RNN with “hidden interactions” achieved better results by 2%~3% compared to using RNN alone. This improvement is already a very significant number.

Summary #

Today, I presented an article from the Google team at WSDM 2018. This article discusses how to apply the cross characteristics commonly found in traditional recommendation system models (such as matrix factorization) to deep learning.

Let’s recap the key points: First, we briefly introduced the author group information of this article. Second, we detailed the problem that this article aims to solve and its contributions. Third, we analyzed the core content of the proposed method in the article and the experimental results.

Finally, I leave you with a question to ponder: Is it a problem of deep models that they cannot capture cross characteristics well by default?