More results...

Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors

More results...

Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Under Construction
WHAM (video-based mocap research by Carnegie Melon)

@Michael_J_Black

https://x.com/Michael_J_Black/status/1734830163148296601?s=20

WHAM defines the new state of the art in 3D human pose estimation from video. By a large margin. It’s fast, accurate, and it computes human pose in world coordinates. It’s also the first video-based method to be more accurate than single-image methods. 1/8

WHAM has several key components. First, using the #AMASS dataset, we pre-train a network to encode/decode 2D keypoint sequences. For accuracy, we also need image cues. We also train an image sequence encoder and a network that integrates features from keypoints and video. 2/8

Most methods estimate 3D humans in camera coordinates. To estimate them in global coordinates, we estimate the camera’s angular acceleration and use this to decode the person’s global trajectory. To get accurate global motion, we also estimate and use foot-ground contact. 3/8

The core method runs at over 200fps, orders of magnitude faster than recent methods like SLAHMR and PACE, which are based on optimization instead of direct regression. 4/8

We evaluate pose accuracy on 3DPW, RICH, and EMDB with the lowest errors by a large margin. EMDB is a challenging new dataset with long sequences and a moving camera. Following a person with a camera is particularly challenging for prior methods like SLAHMR and TRACE. 5/8

WHAM uses an estimate of the camera’s angular velocity, which we can get either using a SLAM method or the camera’s gyro, if available. In either case, WHAM is significantly better at estimating a person’s global trajectory, suffering much less from drift. 6/8

Surprisingly, until now, video-based methods for 3D human pose estimation were less accurate than single-frame methods. There is limited video training data with ground trut. Our approach leverages AMASS for scale and image features for robustness, getting the best of both. 7/8

This is a great collaboration between @PerceivingSys (the Department of Perceiving Systems, part of the Max Planck Institute for Intelligent Systems in Tübingen, Germany) and @CarnegieMellon (Carnegie Mellon University in Pittsburgh, PA, USA). @soyong_shin continues a tradition of amazing @MPI_IS interns. Authors: @soyong_shin, Juyong Kim,@enihalilaj and myself. 8/8

arXiv: https://arxiv.org/abs/2312.07531
project: https://wham.is.tue.mpg.de

 

 





LATEST UPDATES






© Seabridge Films 2024. No warranties are made for the accuracy of content on this website. For educational purposes only. Please do not reproduce, in whole or in part, any resources in this website without express written permission.

Use of code from this website, including scripts and Python-based plugins, is at your own risk.