047 Overview of Multistage Ranking Systems

047 Overview of Multistage Ranking Systems #

On Monday, I introduced a macro-classification of search systems, including traditional text matching information retrieval systems and machine learning information retrieval systems. This classification allows you to have a clear understanding of the historical development of information search systems and familiarize yourself with the characteristics of these two types of search systems.

Today, we will analyze another framework of search systems: the Multi-Turn Scoring System.

Overview of Multi-turn Scoring System #

What is a multi-turn scoring system? Why do search systems need multi-turn scoring?

Let’s take the machine learning search system introduced last time as an example. In general, the purpose of a machine learning search system is to use machine learning models to predict the relevance between documents and search keywords. In an ideal state, for each query keyword, we need to score every document in the data set.

If it is an application scenario similar to an internet search engine, then theoretically, each query keyword needs to score several hundred million or even several billion web pages. Obviously, doing this is unrealistic considering just the sheer quantity.

On the other hand, currently prevalent machine learning models, particularly tree models such as GBDT (Gradient Boosted Decision Trees) or neural networks that perform well in ranking problems, have high computational time complexity. It is difficult to score a relatively large number of documents (several thousand or even tens of thousands) in real-time response time (e.g., within a few hundred milliseconds). As mentioned earlier, the entire data set contains hundreds of millions or even billions of documents, which makes it even more difficult.

In such cases, we need a mechanism that can quickly evaluate several hundred to several thousand documents (depending on the specific application) for each query keyword. Then, complex models can be applied to calculate and rank this subset of several hundred to several thousand documents. This process of scoring documents in two turns is called the “two-turn scoring framework” (see reference [3]).

The first round of scoring is often referred to as “Top-K extraction”. As you can see, in this mechanism, relatively simple models and methods can be used for the first round of scoring because this round is potentially performed on the entire data set. This round is also the most optimized round since it needs to quickly find several hundred or several thousand suitable documents from the massive data set.

Then, in the second round, when the number of documents has been reduced to several thousand or even several hundred, more complex models can be used. This is actually one of the objectives of the entire multi-turn scoring process, which is to apply complex models to a relatively appropriate data set.

In reality, we not only can perform two rounds of scoring on documents but can also extend to multiple rounds of scoring, such as Yahoo search engine’s “three-round scoring mechanism” (see reference [1]). The third round can further improve the quality of search results by using the “contextual features” generated from the second round of scoring. Similar ideas can also be found in reference [2].

In general, there are two apparent characteristics of a multi-turn scoring system. One characteristic is that the number of documents used in each round is less than the previous round. In other words, the purpose of a multi-turn system is to filter out fewer documents after each round. The other characteristic is that the number of features used in each round is more complex and the models are also more complex than the previous round.

Round One: “Top K Extraction” #

I just discussed the mechanism of the multi-round scoring system. Now let’s take a look at the first round of scoring, also known as “Top K Extraction,” and its technical features.

A core problem of “Top K Extraction” is how to quickly return several hundred to several thousand valuable documents from a very large dataset. This requires certain requirements for the data structure used to retrieve documents, as well as the models used.

Firstly, the “inverted index” is a very important mechanism. Whether an effective index can be established is crucial to the success of the first round of scoring.

Traditional inverted indexes have already been able to effectively “reduce” unnecessary documents to a large extent. Let me briefly explain this basic data structure, and we can review the content of the inverted index together. The “field” in the index refers to a query keyword, and each field corresponds to a list of documents that contain the query keyword.

This document list is mostly sorted in some important order. For example, if a document has a high relevance to the query keyword, it will be ranked higher in this list. Of course, not all documents that contain the query keyword will necessarily be included in this list. Also, the reason it is called an “index” is that this list does not actually store the entire document, but often only stores the document’s identifier.

In addition to extracting documents through the basic index, we can also use some simple models, such as linear models. A classic method is called the “WAND operator” (WAND Operator, see reference [4]).

Strictly speaking, the WAND operator does not apply a general, universal linear model directly to document indexing. Instead, if we can simplify the model to a linear model with only positive coefficients, the entire model can be seen as the dot product of two vectors, and WAND is an optimization on the dot product in the index.

Of course, developers not only want to directly apply linear models to inverted indexes. In fact, over the years, there have been many attempts to directly apply tree models to inverted search. However, because of the performance factors mentioned earlier, tree models generally cannot be directly applied (here is a reference document [5] for you to read). It can be said that the optimization of tree models is still in the research stage.

Re-ranking in the second or later rounds #

After the first round, we enter the second phase, also known as the “re-ranking” phase. In this phase, the documents have already been loaded into memory from the index. Typically, in a regular architecture, all hundreds or thousands of documents are integrated into the memory of a single machine at this point.

When considering the first and second rounds, it is important to understand the key difference between these two rounds in order to determine which model can be better applied to the different scenarios.

Firstly, the first round must be applicable to searching the inverted index. Modern indexing patterns are often deployed across multiple nodes (machines). In other words, each node owns a portion, but not the entire collection, of documents. This means that the machine learning methods we previously mentioned, such as pointwise, pairwise, and listwise, are difficult to directly use at the index level because each node can only access and score a portion of the documents for computational efficiency reasons.

Therefore, the main difference between the two rounds is that the first round generally focuses on scoring individual documents, while only the second round can leverage pairwise or listwise methods to score the documents. As we mentioned before, pairwise and listwise methods are more effective than the pointwise method, so it becomes increasingly important to balance the performance differences between these two methods in the two rounds.

Let me briefly mention the other rounds after the second round. Once we have applied the second round, we have essentially generated the final result set. Why do we still need additional rounds?

There are at least two reasons why we might need additional rounds.

Firstly, in many search systems, relevance ranking is just one aspect of the search system. The search system may also introduce “diversification” or other “business rules”. It is difficult to fully implement these rules or further reorder them in the earlier rounds.

Secondly, after the final document set is generated, there is evidence (see reference [1]) that we can generate more refined features to further improve the accuracy of ranking. Therefore, multiple rounds of scoring are worth exploring.

Reference: [1] Reference goes here.

Summary #

Today I talked to you about a very important idea in modern search technology, which is the concept of multi-round scoring system. Let’s review the key points together: First, we discussed why we need multi-round scoring and what the core idea behind it is. Second, we talked about the characteristics of the first round, the second round, and subsequent rounds.

Finally, I’ll leave you with a question to ponder: How can we evaluate the performance of the first-round model in a multi-round scoring system?

References

Dawei Yin, Yuening Hu, Jiliang Tang, Tim Daly, Mianwei Zhou, Hua Ouyang, Jianhui Chen, Changsung Kang, Hongbo Deng, Chikashi Nobata, Jean-Marc Langlois, and Yi Chang. Ranking Relevance in Yahoo Search. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ‘16). ACM, New York, NY, USA, 323-332, 2016.
Ruey-Cheng Chen, Luke Gallagher, Roi Blanco, and J. Shane Culpepper. Efficient Cost-Aware Cascade Ranking in Multi-Stage Retrieval. Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ‘17). ACM, New York, NY, USA, 445-454, 2017.
Van Dang, Michael Bendersky, and W. Bruce Croft. Two-Stage learning to rank for information retrieval. Proceedings of the 35th European conference on Advances in Information Retrieval (ECIR’13) , Pavel Serdyukov, Pavel Braslavski, Sergei O. Kuznetsov, Jaap Kamps, and Stefan Rüger (Eds.). Springer-Verlag, Berlin, Heidelberg, 423-434, 2013.
Andrei Z. Broder, David Carmel, Michael Herscovici, Aya Soffer, and Jason Zien. Efficient query evaluation using a two-level retrieval process. Proceedings of the twelfth international conference on Information and knowledge management (CIKM ‘03). ACM, New York, NY, USA, 426-434, 2003.
N. Asadi, J. Lin and A. P. de Vries. Runtime Optimizations for Tree-Based Machine Learning Models. In IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 9, 2281-2292, 2014.