023 Cvpr 2018 Paper Review How to Perform 3 D Modeling of the Human Body on an Aggregate Level

023 CVPR 2018 Paper Review - How to Perform 3D Modeling of the Human Body on an Aggregate Level #

Today we will share the best student paper from CVPR conference, titled “Total Capture: A 3D Deformation Model for Tracking Faces, Hands and Bodies”.

Many academic conferences award the best student paper to encourage student participation in research activities, so the usual requirement for this award is that the first author must be a student at the time of publication.

The authors of this paper are from Carnegie Mellon University.

The first author, Hanbyul Joo, is a scholar from South Korea who is currently pursuing a PhD at the Robotics Institute of Carnegie Mellon University. His PhD research focuses on “Computational Behavioral Science”. He has published several CVPR and ICCV papers in the field of computer vision.

The second author, Tomas Simon, is also a PhD student at the Robotics Institute of Carnegie Mellon University. His research area is “Spatiotemporal Modeling of 3D Motion”.

The last author is their advisor, Yaser Sheikh, a professor at the Robotics Institute.

Main Contributions of the Paper #

The main problem that this paper aims to address is the three-dimensional modeling of the human body and tracking its behaviors and activities.

This task may seem simple, but there are several challenges.

Firstly, most of the previous work on three-dimensional modeling of the human body focused on different body parts separately, such as modeling the face, body, and hands. The settings for these parts are often different, for example, modeling the face is usually in close-up, while the body is observed in terms of overall movement. As a result, the modeling of different body parts is often performed at different scales, making it difficult to seamlessly integrate them.

Secondly, there have been limited modeling works specifically targeting certain body parts, such as hair and feet, which are essential in the overall body modeling.

This paper discusses the modeling of these parts, hair and feet, providing a framework for the overall modeling of the human body. Specifically, the paper presents two models: one called “Frankenstein” and the other called “Adam”.

“Frankenstein” relies on the modeling of different body parts and connects these models together, applying some processing to ensure that the final three-dimensional modeling is realistic. Building upon this model, “Adam” incorporates elements of hair and feet, abandoning some of the special processing of “Frankenstein” to create a simpler model that achieves a more realistic level of detail.

Core Methodology of the Paper #

First, let’s take a look at the model called “Frankenstein”. The idea behind this model is to assemble the best possible models that can represent different parts of the human body. Generally, each part consists of three sets of parameters: motion parameters, shape parameters, and global translation parameters.

For the human body parts, the authors used the SMPL model [1]. This model combines the mean shape of the human body with the variations in shape, and then applies a linear blend skinning (LBS) transformation to obtain the final modeling of the body parts.

For the face, the authors used a model called FaceWarehouse [2]. This model linearly combines the mean shape of the face with the variations in shape, as well as the dynamic variations.

There is currently no existing model for hands. In this paper, the authors proposed their own model, which involves modeling the skeleton and joints of the hands, followed by a similar LBS transformation as the body parts. Similarly, a similar modeling approach was applied to the feet.

Once we have the models for the body, face, hands, and feet, the next step is to connect these parts together. Initially, the authors retain the model for the body and remove the face, hands, and feet from this model. Then, using the respective global translation parameters for the face, hands, and feet models, the coordinates of these parts are aligned. Finally, the authors also apply a “fusion function” to construct a smooth human structure.

The “Frankenstein” model has a four-term objective optimization function. The first term is to fit the “key points” so that the skeleton of the body can fit the motion trajectory. The second term is to fit the “3D point cloud,” meaning that most of the body’s physical form can fit the motion trajectory. The third term is an additional “heuristic” that addresses the problem caused by the separate shape parameters for each body part without complete connection in the model. The final term is the Gaussian prior probability, which helps the model find a unique solution.

Based on the “Frankenstein” model, the authors developed the “Adam” model. To construct “Adam,” they captured the morphological data of 70 human bodies and initially built “Frankenstein” models for these bodies. Based on these models, the authors added the morphology of the hair and clothes to the human body model and redefined the framework of the entire model.

Compared to “Frankenstein,” “Adam” directly models all parts of the human body. This model is quite similar to the specific part models we described earlier - it linearly combines the mean shape, shape variations, and facial expression values of the human body and applies an LBS transformation.

Because “Adam” simplifies and innovates on the model, it only has three terms in the objective optimization function. The “heuristic” used in the “Frankenstein” model, which we discussed earlier, is no longer needed in “Adam”.

Experimental Results #

In the experiment, the authors used 140 VGA cameras to reconstruct the key points of the three-dimensional body. 480 VGA cameras were used to reconstruct the feet, and 31 high-definition cameras were used for the reconstruction of facial and hand key points, as well as the construction of three-dimensional point clouds.

The authors demonstrated the modeling of three-dimensional human motion using two models, “Frankenstein” and “Adam”. Overall, the performance of these two models is very similar. “Adam” appears more realistic in motion due to the presence of hair and clothing. However, in some cases, the legs reconstructed by “Adam” may appear slightly skinny and inconsistent. The authors attributed this to the lack of data.

Nevertheless, overall, as the first work on full-body three-dimensional modeling and dynamic tracking of the human body, this paper should be considered a satisfactory result.

Summary #

Today I have talked about the best student paper of CVPR 2018.

Let’s review the key points together: First, we have detailed the problem that this paper aims to solve, which is the 3D modeling of human motion from a holistic perspective. Second, we have briefly introduced the two models proposed in the paper, “Frankenstein” and “Adam.” Third, we have briefly introduced the impressive experimental results achieved in this paper.

Finally, let me leave you with a question to ponder: If we need to improve the “Adam” model, what do you think should be the next step?

References

  1. M. Loper, N. Mahmood, J. Romero, G. Pons-Moll, and M. J. Black. SMPL: A Skinned Multi-Person Linear Model. In TOG, 2015.

  2. C. Cao, Y. Weng, S. Zhou, Y. Tong, and K. Zhou. FaceWareHouse: A 3D Facial Expression Database for Visual Computing. In TVCG, 2014.