RTMPose
Real-Time Multi-Person 2D Pose Estimation
RUN
RTMPose is a real-time, high-performance pose estimation framework from OpenMMLab. It detects 17 COCO body keypoints per person using a two-stage pipeline: a lightweight person detector (RTMDet) followed by the RTMPose model running in your browser via ONNX Runtime Web. No data leaves your device.
RTMPose uses the SimCC (Simple Coordinate Classification) codec, which treats keypoint localization as two independent 1-D classification tasks — one for the X axis and one for the Y axis. This design removes the need for full 2-D heatmaps, achieving heatmap-level accuracy at a fraction of the computational cost, making it well-suited for real-time browser inference.
The tool supports two modes:
- Multi-person (default): RTMDet detects all people in the frame; RTMPose runs on each cropped region.
- Single-person / full-frame: RTMPose runs directly on the whole frame — useful when only one subject is present or when the detection model has not been downloaded.
The following 17 COCO keypoints are detected per person:
- Nose
- Left Eye
- Right Eye
- Left Ear
- Right Ear
- Left Shoulder
- Right Shoulder
- Left Elbow
- Right Elbow
- Left Wrist
- Right Wrist
- Left Hip
- Right Hip
- Left Knee
- Right Knee
- Left Ankle
- Right Ankle
The data is exported as a CSV file structured as follows:
| frame | timestamp | tag | person | nose_score | nose_x | nose_y | leftEye_score | leftEye_x | leftEye_y | … | rightAnkle_score | rightAnkle_x | rightAnkle_y |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1700000000000 | 0 | 0 | 0.982 | 312.4 | 118.7 | 0.971 | 302.1 | 108.2 | … | 0.834 | 328.6 | 489.3 |
Each row represents one detected person in one frame. The score columns hold the SimCC confidence for that keypoint (0–1). Keypoints below the confidence threshold are exported as empty cells.
Features
- Real-time webcam pose estimation
- Upload any video file for analysis
- Scrub through the uploaded video frame-by-frame
- Download all predictions as a CSV file
- Adjustable confidence threshold and display options
- FPS counter and inference-latency display
- Runs entirely in the browser — no server needed
Model info
- Pose model: RTMPose-s — 17 COCO body keypoints, 256 × 192 input
- Detection model: RTMDet-nano — person detector, 320 × 320 input
- Runtime: ONNX Runtime Web (WebAssembly backend)
- Models are downloaded once and cached by the browser.
The first load downloads ~22 MB (pose model) + ~3 MB (detection model) from HuggingFace. Subsequent visits use the browser cache and load instantly.