RTMPose

Real-Time Multi-Person 2D Pose Estimation

RUN

RTMPose is a real-time, high-performance pose estimation framework from OpenMMLab. It detects 17 COCO body keypoints per person using a two-stage pipeline: a lightweight person detector (RTMDet) followed by the RTMPose model running in your browser via ONNX Runtime Web. No data leaves your device.

RTMPose uses the SimCC (Simple Coordinate Classification) codec, which treats keypoint localization as two independent 1-D classification tasks — one for the X axis and one for the Y axis. This design removes the need for full 2-D heatmaps, achieving heatmap-level accuracy at a fraction of the computational cost, making it well-suited for real-time browser inference.

The tool supports two modes:

  1. Multi-person (default): RTMDet detects all people in the frame; RTMPose runs on each cropped region.
  2. Single-person / full-frame: RTMPose runs directly on the whole frame — useful when only one subject is present or when the detection model has not been downloaded.

The following 17 COCO keypoints are detected per person:

  1. Nose
  2. Left Eye
  3. Right Eye
  4. Left Ear
  5. Right Ear
  6. Left Shoulder
  7. Right Shoulder
  8. Left Elbow
  9. Right Elbow
  10. Left Wrist
  11. Right Wrist
  12. Left Hip
  13. Right Hip
  14. Left Knee
  15. Right Knee
  16. Left Ankle
  17. Right Ankle

The data is exported as a CSV file structured as follows:

frame timestamp tag person nose_score nose_x nose_y leftEye_score leftEye_x leftEye_y rightAnkle_score rightAnkle_x rightAnkle_y
0 1700000000000 0 0 0.982 312.4 118.7 0.971 302.1 108.2 0.834 328.6 489.3

Each row represents one detected person in one frame. The score columns hold the SimCC confidence for that keypoint (0–1). Keypoints below the confidence threshold are exported as empty cells.

Features

Model info

The first load downloads ~22 MB (pose model) + ~3 MB (detection model) from HuggingFace. Subsequent visits use the browser cache and load instantly.

Source Code