RTMPose


            Real-Time Multi-Person 2D Pose Estimation

RUN

RTMPose is a real-time, high-performance pose estimation framework from OpenMMLab. It detects 17 COCO body keypoints per person using a two-stage pipeline: a lightweight person detector (RTMDet) followed by the RTMPose model running in your browser via ONNX Runtime Web. No data leaves your device.

RTMPose uses the SimCC (Simple Coordinate Classification) codec, which treats keypoint localization as two independent 1-D classification tasks — one for the X axis and one for the Y axis. This design removes the need for full 2-D heatmaps, achieving heatmap-level accuracy at a fraction of the computational cost, making it well-suited for real-time browser inference.

The tool supports two modes:

Multi-person (default): RTMDet detects all people in the frame; RTMPose runs on each cropped region.
Single-person / full-frame: RTMPose runs directly on the whole frame — useful when only one subject is present or when the detection model has not been downloaded.

The following 17 COCO keypoints are detected per person:

Nose
Left Eye
Right Eye
Left Ear
Right Ear
Left Shoulder
Right Shoulder
Left Elbow
Right Elbow
Left Wrist
Right Wrist
Left Hip
Right Hip
Left Knee
Right Knee
Left Ankle
Right Ankle

The data is exported as a CSV file structured as follows:

frame	timestamp	tag	person	nose_score	nose_x	nose_y	leftEye_score	leftEye_x	leftEye_y	…	rightAnkle_score	rightAnkle_x	rightAnkle_y
0	1700000000000	0	0	0.982	312.4	118.7	0.971	302.1	108.2	…	0.834	328.6	489.3

Each row represents one detected person in one frame. The score columns hold the SimCC confidence for that keypoint (0–1). Keypoints below the confidence threshold are exported as empty cells.

Features

Real-time webcam pose estimation
Upload any video file for analysis
Scrub through the uploaded video frame-by-frame
Download all predictions as a CSV file
Adjustable confidence threshold and display options
FPS counter and inference-latency display
Runs entirely in the browser — no server needed

Model info

Pose model: RTMPose-s — 17 COCO body keypoints, 256 × 192 input
Detection model: RTMDet-nano — person detector, 320 × 320 input
Runtime: ONNX Runtime Web (WebAssembly backend)
Models are downloaded once and cached by the browser.

The first load downloads ~22 MB (pose model) + ~3 MB (detection model) from HuggingFace. Subsequent visits use the browser cache and load instantly.

Source Code

https://github.com/open-mmlab/mmpose/tree/main/projects/rtmpose