Gesture Recognizer


            Real-Time Multi-Hand Gesture Recognition

RUN

The Gesture Recognizer detects and classifies hand gestures in real time using MediaPipe Tasks Vision (v0.10.3), running entirely in the browser — no data leaves your device. Up to four hands are tracked simultaneously, each returning 21 normalised 2D landmarks and a gesture class with confidence score.

Eight gestures are recognised out of the box:

None —
Closed Fist ✊
Open Palm ✋
Pointing Up ☝️
Thumb Down 👎
Thumb Up 👍
Victory ✌️
I Love You 🤟

The underlying model is a two-stage pipeline: a hand detector localises each hand in the frame, then a landmark model regresses the 21 keypoints. A lightweight gesture classification head runs on top of the landmarks to produce the category and score. All three stages are bundled in a single .task file and executed via WebAssembly + WebGL, keeping latency below 15 ms on most devices.

Detection sensitivity can be tuned from the settings panel:

Detection confidence — minimum score to accept a new hand detection.
Tracking confidence — minimum score to keep tracking an existing hand across frames.
Gesture confidence — minimum score for a gesture label to be displayed on the canvas.
Max hands — upper bound on how many hands are tracked at once (1–4).

The data is exported as a CSV file structured as follows:

frame	timestamp	tag	hand	side	gesture	confidence	lm0_x	lm0_y	lm0_z	…	lm20_x	lm20_y	lm20_z
12	1718000000	0	0	Right	Thumb_Up	0.9821	0.512	0.748	-0.031	…	0.421	0.302	-0.089

Landmark coordinates are in MediaPipe's normalised image space (x, y ∈ [0, 1], origin at top-left). The z value is depth relative to the wrist — negative values are closer to the camera. One row is written per detected hand per inference frame; frames where no hand is detected are omitted.

Reference: Google. MediaPipe Gesture Recognizer. developers.google.com/mediapipe/solutions/vision/gesture_recognizer

Source Code

https://developers.google.com/mediapipe/solutions/vision/gesture_recognizer