RTMW3D

Real-Time 3D Body Pose Estimation (Multi Person)

RUN

RTMW3D

RTMW3D is a real-time multi-person 3D whole-body pose estimation pipeline running entirely in the browser via ONNX Runtime Web. It uses two ONNX models:

Up to 4 people can be tracked simultaneously. Each person is assigned a distinct color in both the 2D overlay and the 3D view. The 3D view is interactive — drag to rotate, scroll to zoom.

Pipeline

  1. Each video frame is letterboxed to 640×640 and fed to RTMDet-s to detect people.
  2. Each detected bounding box is cropped and padded to 288×384 (model aspect ratio) and fed to RTMW3D-x.
  3. The model outputs SimCC probability distributions for X, Y, and Z. Argmax decoding gives pixel-space XY and root-relative depth Z.
  4. Keypoints are back-projected to image space (2D overlay) and converted to metric coordinates (3D view).

Model info

PropertyValue
DetectorRTMDet-s
Pose modelRTMW3D-x (OpenMMLab)
Keypoints displayed17 body (COCO subset of 133 wholebody)
3D coordinate systemRoot-relative metres, Y-up
Max people4 simultaneously
RuntimeWebGPU (preferred) → WASM fallback
Model sizeRTMDet-s ~3 MB · RTMW3D-x ~369 MB (browser-cached)

CSV output columns

One row per person per frame. Columns:

17 COCO body keypoints

nose · left_eye · right_eye · left_ear · right_ear · left_shoulder · right_shoulder · left_elbow · right_elbow · left_wrist · right_wrist · left_hip · right_hip · left_knee · right_knee · left_ankle · right_ankle

Source

Models: OpenMMLab RTMPose · ONNX weights: Soykaf/RTMW3D-x on HuggingFace

Source Code