022 Cvpr 2018 Paper Review How to Study Relationships Between Computer Vision Tasks

022 CVPR 2018 Paper Review - How to Study Relationships between Computer Vision Tasks #

From June 18th to 22nd this year, the Conference on Computer Vision and Pattern Recognition (CVPR) was held in Salt Lake City, USA. CVPR has been held for over 30 years since 1985 and is one of the top conferences in the field of computer vision.

In recent years, CVPR has developed into a grand event in the field of artificial intelligence. Influenced by the wave of artificial intelligence, the number of submissions and the number of participants in the conference have significantly increased. This year, the conference received a total of 3300 paper submissions, accepted 979 papers, and had an acceptance rate of nearly 30%. In the end, 70 papers were selected for oral presentations and 224 papers for poster presentations. The number of participants has been growing at a rate of nearly 1000 people per year in the past two years, and this year it reached over 6000 people, more than three times the number in 2014. At the same time, the number of reviewers for the conference reached an amazing 10,000.

In addition to the main conference, CVPR also organized 21 tutorials, 48 workshops, and a doctoral forum with sponsorship from over 115 companies.

It can be said that finding the most valuable and influential information among so many papers is like finding a needle in a haystack. Here, I have selected three papers from this year’s CVPR for you, hoping to serve as a starting point for further exploration.

Today, let’s share the best paper of the conference, titled “Taskonomy: Disentangling Task Transfer Learning.”

First, let me briefly introduce the authors of the paper.

The first author, Amir R. Zamir, is currently a postdoctoral researcher at Stanford University and the University of California, Berkeley. He has published over 30 papers in the field of computer vision and received the Best Student Paper Award at CVPR 2016.

The second author, Alexander Sax, has just graduated with a master’s degree from the Stanford University Computer Science Department and will soon pursue a doctoral degree at the University of California, Berkeley. He has already published two CVPR papers as a master’s student.

The third author, Bokui Shen, has just graduated from the Stanford University Computer Science Department and will continue to pursue a Ph.D. at the same institution. Despite just graduating from undergraduate studies, he has already published two CVPR papers and one ICCV paper.

The fourth author, Leonidas Guibas, is a professor in the Stanford University Computer Science Department and an ACM and IEEE Fellow. He is also a member of the National Academy of Engineering and the National Academy of Sciences. His Ph.D. advisor was Donald Knuth, the Turing Award winner.

The fifth author, Jitendra Malik, is a professor in the University of California, Berkeley, Computer Science Department and an ACM and IEEE Fellow. He is also a member of the National Academy of Engineering and the National Academy of Sciences. Malik is an academic authority in the field of computer vision.

The last author, Silvio Savarese, is a professor in the Stanford University Computer Science Department. His research focuses on computer vision and computer graphics. We are familiar with Chinese scholar Fei-Fei Li, Savarese is Fei-Fei Li’s husband.

Main Contributions of the Paper #

In summary, this paper mainly investigates the relationships between computer vision tasks and proposes a computational framework that can quantitatively learn the similarities between these tasks. At the same time, these similar tasks can help improve the performance of tasks with limited data. This is essentially the core idea of transfer learning: how to transfer knowledge from tasks or domains that have already been learned to tasks or domains with limited data and more challenging learning.

Many researchers may have the feeling that there is some logic or intuitive connection between certain computer vision tasks during their research. For example, in the field of computer vision, tasks such as object recognition, depth estimation, edge detection, and pose estimation are generally believed to be related tasks. However, the relationship between some visual tasks seems less intuitive. For example, how edge detection and shading help with pose estimation is not clear.

If we try to solve each task separately, it will inevitably be very challenging. This paper actually shows that many tasks are correlated, and using this correlation can bring great convenience in terms of data. In other words, we can learn more tasks with less data. From this perspective, transfer learning also brings hope to new tasks. Even when we do not have a large amount of manually annotated data, we can still achieve effective results on new tasks.

Another important contribution of this paper is the proposal of a computational framework that does not require prior knowledge, such as determining in advance which two tasks are related or using probabilistic modeling methods that require prior probabilities of task relationships. The framework proposed in this paper starts from the perspective of data and results, thus avoiding the incompleteness and inaccuracy of prior information.

Core Methods of the Paper #

The methods proposed in this paper consist of four components: task-specific modeling, transfer modeling, task and task relationship normalization, and finally computing the relationship graph of tasks. Each component has different objectives.

First, we need to establish a model for each individual task. These models have two tasks: the first is to maximize the accuracy of the task itself, and the second is to extract representative intermediate representation results that can contribute to transfer learning.

The second component is transfer modeling. This part mainly utilizes the intermediate representation learned from the first component and then transfers it to the target task. Besides using the original representation or task for reference, the authors propose to use multiple original tasks to achieve better performance. This effectively associates multiple tasks with one target task.

The third component is task relationship normalization. This part is actually a highlight of this paper. After obtaining the results of transfer learning, we can use the relationships between each pair of tasks to obtain a matrix that fully represents the relationships between all tasks. However, the absolute values of the transfer loss function between two tasks cannot be directly compared. Normalizing the data to a range of 0 and 1, which is commonly used in machine learning, is also not suitable because it does not consider the relationship between the rate of loss function change and the accuracy of the target task.

Therefore, the authors of this paper propose a sequential normalization method. Simply put, instead of looking at the absolute transfer values between two tasks, they compare which original task can gain more accuracy for the target task on the test set compared to other tasks. This way, all tasks can be compared. In short, the purpose of task relationship normalization is to construct a matrix representing the relationships between tasks.

The purpose of the final component is to extract a true relationship graph of all tasks from this relationship matrix. In other words, we hope to find a valuable subgraph from a fully connected graph. Here, the authors utilize a method called “Boolean Integer Programming” to mine a representative subgraph under certain constraints.

Experimental Results #

The authors proposed a new dataset with over 4 million images. In this dataset, there are 26 computer vision tasks. From the experiments, the authors discovered a few patterns. For example, 3D and 2D tasks were naturally grouped together, while other high-level tasks such as contextual segmentation and scene recognition were grouped separately.

To study whether this discovered structure could really achieve the goal of transfer learning, the authors randomly combined different pairs of tasks, creating a sort of random task map. They then performed transfer learning based on the learned structure to see if the results were better than random. The answer is yes, significantly better. In this paper, the authors demonstrated that the learned structure not only helps improve the performance of the target task, but also provides a good explanation for the relationships between tasks.

Conclusion #

Today, I discussed the best paper from CVPR 2018.

Let’s review the main points together: First, we provided a detailed introduction to the problem addressed in this paper and its contributions. The paper explores the relationship between computer vision tasks and proposes a computational framework that can be used for transfer learning. Second, we briefly introduced the core methods proposed in the paper, which consist of four components. Third, we briefly discussed the experimental results presented in the paper.

Finally, I’ll leave you with a question to ponder: Currently, the relationships being explored are mainly pairwise relationships between tasks. Is there a method to discover the high-dimensional relationships between three tasks, for example?