ScoreHypo: Probabilistic Human Mesh Estimation with Hypothesis Scoring
CVPR 2024

1Peking University    2International Digital Economy Academy (IDEA)
3Shanghai Jiao Tong University

TL;DR: We introduce ScoreHypo, a versatile framework for monocular 3D human mesh estimation by i) generating multiple hypotheses with HypoNet and ii) selecting high-quality ones with ScoreNet.

Abstract

Monocular 3D human mesh estimation is an ill-posed problem, characterized by inherent ambiguity and occlusion. While recent probabilistic methods propose generating multiple solutions, little attention is paid to obtaining high-quality estimates from them. To address this limitation, we introduce ScoreHypo, a versatile framework by first leveraging our novel HypoNet to generate multiple hypotheses, followed by employing a meticulously designed scorer, ScoreNet, to evaluate and select high-quality estimates. ScoreHypo formulates the estimation process as a reverse denoising process, where HypoNet produces a diverse set of plausible estimates that effectively align with the image cues. Subsequently, ScoreNet is employed to rigorously evaluate and rank these estimates based on their quality and finally identify superior ones. Experimental results demonstrate that HypoNet outperforms existing state-of-the-art probabilistic methods as a multi-hypothesis mesh estimator. Moreover, the estimates selected by ScoreNet significantly outperform random generation or simple averaging. Notably, the trained ScoreNet exhibits generalizability, as it can effectively score existing methods and significantly reduce their errors by more than 15%.



Results on natural videos



Multi-hypotheses & ScoreNet

Qualitative results on challenging in-the-wild images. The yellow and blue-colored meshes are the generated results of HypoNet, while the green ones are the final results selected by ScoreNet.




Diffusion process

We visualize the denoising process of the hypotheses, which generates high-quality results in just 4 steps. The yellow and blue-colored meshes are the generated results of HypoNet, while the green ones are the final results selected by ScoreNet.




Citation

Template courtesy of Jon Barron.