024 Cvpr 2018 Paper Review How to Address the Problem of High Computational Complexity in Ranking Learning

024 CVPR 2018 Paper Review - How to Address the Problem of High Computational Complexity in Ranking Learning #

Today, let’s take a look at one of the best paper nominees from this conference. The title is “Efficient Optimization for Rank-based Loss Functions”.

First, let’s briefly introduce the authors of this paper. The authors of this paper come from several different academic institutions.

The first author, Pritish Mohapatra, is a Ph.D. student in Computer Science at the International Institute of Information Technology in Hyderabad, India. He has published multiple papers in internationally recognized machine learning conferences such as NIPS, CVPR, ICCV, and AISTATS.

The second author, Michal Rolinek, is a postdoctoral researcher at the Max Planck Institute for Intelligent Systems in Germany. In this paper, the contributions of the first and second authors are equally significant.

C.V. Jawahar is the third author and a professor at the International Institute of Information Technology in India. He is the Ph.D. advisor of the first author, Pritish Mohapatra.

Vladimir Kolmogorov, the fourth author, is a machine learning professor at the Institute of Science and Technology Austria.

The last author, M. Pawan Kumar, is from the University of Oxford.

Main Contributions of the Paper #

This paper proposes a fast optimization algorithm for loss functions based on complete ranking in ranking learning, which is a significant contribution.

In computer vision, many machine learning tasks require ranking preferences between two images. In information retrieval or search, ranking is a core problem. Therefore, any major improvement in ranking learning algorithms has wide applications.

Let’s review the three forms of ranking learning algorithms we have learned.

The first type is single-point ranking. This algorithm only judges if each document is relevant to a specific query keyword. Most single-point ranking algorithms transform the entire problem into a classification or regression problem. This allows leveraging the convenience of large-scale machine learning to quickly learn the ranking function.

The second type is pairwise ranking. This algorithm is based on the single-point ranking. Since the single-point ranking completely ignores the relative relationship between two documents, pairwise ranking models the relative relevance or difference in relevance between two documents with the same query keyword.

The third type is listwise ranking. Listwise ranking directly optimizes the objective function or indicator for ranking. Although this method has theoretical advantages, its computational complexity is generally high, and the improvement in ranking performance is relatively limited in practical scenarios. Therefore, in practical applications, many still use single-point or pairwise ranking.

In this paper, to address the problem of “high computational complexity” in listwise ranking learning, the authors invented an optimization framework called “Quicksort flavored algorithm”. Under this optimization framework, the problem of high computational complexity in ranking learning is significantly improved. The authors then proved that popular ranking learning methods for NDCG and MAP satisfy the invented optimization framework, thereby providing the possibility of fast optimization in theory.

Core Method of the Paper #

To understand the core method of this paper, let’s start with the pairwise ranking learning.

For each query keyword, we can construct a matrix of documents and documents. Each element of this matrix represents the relationship between two documents under the current query keyword. If the matrix element is +1, it indicates that the document represented by the corresponding row should be ranked higher than the document represented by the corresponding column. If the matrix element is -1, it indicates that the document represented by the corresponding row should be ranked lower than the document represented by the corresponding column. Of course, there is also the case where matrix element is 0, which means the ranking of these two documents can be the same. Based on this data, we can derive an overall ranking from all these binary relationships.

Now let’s look at the core idea of pairwise ranking. For the same query keyword, we randomly select a document from the documents related to this query keyword, and then select a document from the documents unrelated to this query keyword. These two selected documents form a pair. We hope to build a model or function that can always make the value of the function for related documents greater than the value of the function for unrelated documents, for any such pair.

If we make some changes to this pairwise ranking, we get the listwise ranking. First, we still predict the function values for each positively related document and each negatively related document. The difference between these two function values is treated as the element in the predicted pairwise matrix corresponding to these two documents. However, at this point, we are not concerned about the relationship between these two documents, but rather the difference between the ranking represented by the predicted pairwise matrix and the true ranking. The smaller the difference, the smaller we consider the final list-based loss function; if the difference is large, then the difference of the loss function is large.

How can we optimize this list-based loss function in order to achieve the optimal scoring for a single document? This is a core challenge in listwise ranking learning.

There is one way to optimize it, which is to find which document pairs violate the ordering principle under the current scoring function. What does it mean to violate the ordering principle? As we just mentioned, the model is supposed to rank the positively related documents ahead of the negatively related documents. However, if the function has not been learned well, then the negatively related documents may also be ranked ahead of the positively related documents, which is called violating the ordering principle.

If we find such pairs, we can adjust the parameters of the function so that such violated pairs do not occur. Obviously, when we have many such pairs, finding the most severe violated pair, that is, the pair where the function value for the negatively related document is much greater than the function value for the positively related document, will be helpful for us to improve the parameters of the function. So, the key here becomes how to find the most severe violated pair (Most-violating ranking).

The authors invented a framework for this task, called " Fast Rank-based Mechanism “. Specifically, the authors found that the most severe violated pair must satisfy certain principles. We need to perform fast sorting on the current data sequence in order to find this pair violating the ordering principle. There are many details involved, so it is recommended to read the original paper if you are interested. Just remember that this fast ranking mechanism utilizes the time complexity of quicksort to achieve the goal of finding the most severe violated pair.

So, do most ranking metrics conform to this mechanism? The authors’ answer is that the widely used MAP and NDCG both conform to this mechanism. The paper provides evidence for this, so we can directly use the conclusions of the paper.

Experimental Results #

The authors conducted experiments on the PASCAL VOC 2011 dataset, mainly comparing the performance difference between directly performing single-point ranking and directly performing list-based optimization with the optimized algorithm proposed in this paper. In this comparison, the method proposed in this paper has a clear advantage, achieving the performance of list-based optimization with the time complexity of single-point ranking.

Summary #

Today, I have talked to you about the best paper nominations for CVPR 2018.

Let’s review the key points together: First, the main contribution of this paper is the proposal of a fast optimization algorithm based on the entire ranking loss function. Second, the core content of the proposed method is the invention of a framework called “Fast Ranking Mechanism”. Third, we briefly introduced the experimental results of the paper.

Finally, I leave you with a thinking question. Recall the LambdaMART algorithm we discussed before. There is actually a step in it that involves finding pairwise violations of the ranking principle. Can you remember what that step is?