028 ACL 2018 Paper Review How to Pose Good Questions in Question Answering Systems

028 ACL 2018 Paper Review - How to Pose Good Questions in Question-Answering Systems #

The 56th Annual Meeting of the Association for Computational Linguistics (ACL 2018) took place in Melbourne, Australia from July 15th to 20th. This conference is one of the top events in the field of natural language processing and computational linguistics.

The Association for Computational Linguistics (ACL) was established in 1962 and sponsors various academic conferences and workshops every year. The ACL conference is the flagship event of ACL and serves as a heavyweight venue to learn about the developments in natural language processing each year.

This year, the conference received 1018 long paper submissions and 526 short paper submissions. In the end, the conference accepted 256 long papers and 125 short papers, resulting in an overall acceptance rate of 24.7%.

Today, let’s take a look at one of the best papers from this conference titled “Learning to Ask Good Questions: Ranking Clarification Questions using Neural Expected Value of Perfect Information” Learning to Ask Good Questions: Ranking Clarification Questions using Neural Expected Value of Perfect Information.

First, let me give you a brief introduction to the authors of the paper.

The first author, Sudha Rao, is a Ph.D. candidate in the Department of Computer Science at the University of Maryland, College Park. She has published multiple papers in conferences such as ACL, EMNLP, NAACL, and has interned at Microsoft Research.

The second author is Sudha Rao’s advisor, Hal Daume III, who is a professor in the Department of Computer Science at the University of Maryland, College Park, and currently works at Microsoft Research in New York. He is an expert in machine learning and natural language processing and has published numerous research papers in various fields, with more than 9,000 citations.

Main Contributions of the Paper #

This paper focuses on “Question & Answering” systems. Question & Answering systems are not only popular among users in practical domains, leading to well-known online Q&A services like Quora, Zhihu, and Stack Overflow, but they are also attracting attention from researchers in the field of artificial intelligence system development.

We have mentioned the “Turing test” before, which is used to determine whether a system or a robot has true artificial intelligence. This test is actually based on the interaction between humans and machines in a question and answer scenario. Therefore, establishing effective question and answering systems has always been one of the core topics in artificial intelligence research, especially in natural language processing research.

The authors of this paper believe that in the context of question and answering systems, a very important means is to ask “clarification questions” for the questions already raised, in order to guide other responders to provide more effective answers. In other words, the topic the authors study is how to find these “clarification questions” that play a bridging role. This is the first major contribution of this paper.

The second major contribution of the paper is the utilization of the “Decision Theoretic” framework and the “Expected Value of Perfect Information” (EVPI) to measure how much useful information a clarification question can add to the original question. In short, EVPI is a measure proposed in this paper to evaluate useful information.

The third contribution of the paper is the construction of a dataset containing 77,000 instances of clarification questions through the Stack Exchange platform (with Stack Overflow as one of its sub-sites). The authors selected 500 samples from this dataset for experimentation and found that the proposed model significantly outperformed some similar algorithms previously used in question and answering systems.

Core Method of the Paper #

Since one of the core contributions of this paper is the introduction of the concept of “clarification-style questioning” for the development of question-answering systems, what exactly is “clarification-style questioning”?

In fact, in this article, the authors do not provide a clear definition of “clarification-style questioning”, but only offer an example to explain what it is.

For example, a user in the Ask Ubuntu forum asks a question about a problem encountered during the installation of the APE package. At this point, if we need to ask “clarification-style questions”, what kind of questions can stimulate others or the person asking the clarification-style questions to further answer the original question?

Let’s take a look at several questions proposed from different angles: asking the specific version number of the Ubuntu system the user is using; asking for the user’s WiFi card information; or asking if the user is running Ubuntu on an X86 system.

In this scenario, the latter two questions are either unable to provide more valuable information for the original question or totally unrelated, while the first question about the specific version number is obviously something the user can provide and can help the person answering the question to narrow down the scope of the problem.

This also leads to the second contribution of this paper, which is how to measure the value of a post.

To answer this question, we need to understand that there are two types of posts that the model needs to process. The first set of posts is the candidate set of clarification-style questions. The second set of posts is the candidate set of final answers. Our ultimate goal is to obtain the best final answer. Clarification-style questions serve as a “bridge” in this process.

Therefore, the authors have constructed an Expected Value of Perfect Information (EVPI) value for each final question, which measures the “expected value” of this question. Why is it an expected value? Because there is an uncertain factor here, which is that different clarification-style questions may lead to different answers. Therefore, the authors use a probabilistic expression here.

In other words, the core of EVPI is actually the calculation of the probability of a certain final answer given a current original question and a clarification-style answer, multiplied by the “benefit” brought by this answer. After calculating for all the candidate final answers in the set, and then taking the average, we obtain the EVPI for a certain clarification-style answer. In other words, the EVPI of a certain clarification-style answer is the weighted average return of all possible final answers it can generate.

From the above definition, we have two uncertainties. First, we do not know the conditional probability of a certain final answer given a current original question and a clarification-style answer. Second, we do not know the benefit of the question. Therefore, the authors use two neural network models to jointly learn these two unknowns. This can be considered an innovation in the modeling aspect of this paper.

Specifically, first, the authors use LSTM to generate corresponding expression vectors for the original question, candidate clarification questions, and final answers. Then, the expression vectors of the original question and a certain candidate clarification question are combined through a neural network to generate a comprehensive expression. Finally, the authors define an objective function to optimize these initial expression vectors.

The objective is for the expression of the answer we need to learn to be close to the initial expression of the answer, while also being close to the expression of the final answer if the question corresponding to this final answer is also close to the original question. In other words, if two questions are similar in expression, the expressions of the answers should also be similar.

So what kind of questions are considered similar questions? The authors use the Lucene information retrieval tool to find similar questions based on an original question. Here, since the authors do not have real label information, they use some methods to annotate the data so that the model can learn whether two questions are related or not.

Experimental Results of the Paper #

The authors used Stack Exchange to construct a dataset for analyzing clarifying questions. The specific idea is that if the original question has been modified by the author, then a question raised in a subsequent post will be regarded as a clarifying question, and the original question will be regarded as a post that has been improved because of the clarifying question. Obviously, this is a very rough condition for data collection. When the original question has been modified by the author, and it has received replies as a result of this modification, it is considered as a final answer. After this construction process, the authors organized more than 77,000 posts.

The authors compared the proposed model with other classical models. The final conclusion is that the proposed model is able to better identify the best clarifying questions, and its performance is better than simply using neural networks to match the original question and the corresponding clarifying question.

Summary #

Today I presented to you a best paper from ACL 2018.

Let’s recap the key points: First, the paper proposes the concept of “clarification questions” to aid in the development of question answering systems. Second, the paper presents a series of methods to describe and evaluate clarification questions. Third, the paper builds a dataset and experimentally demonstrates the effectiveness of the proposed methods.

Finally, I leave you with a question to ponder: Can you come up with a definition for clarification questions based on the introduction in this article?