Gesture Recognizer
Real-Time Multi-Hand Gesture Recognition
RUN
The Gesture Recognizer detects and classifies hand gestures in real time using MediaPipe Tasks Vision (v0.10.3), running entirely in the browser — no data leaves your device. Up to four hands are tracked simultaneously, each returning 21 normalised 2D landmarks and a gesture class with confidence score.
Eight gestures are recognised out of the box:
- None —
- Closed Fist ✊
- Open Palm ✋
- Pointing Up ☝️
- Thumb Down 👎
- Thumb Up 👍
- Victory ✌️
- I Love You 🤟
The underlying model is a two-stage pipeline: a hand detector localises each hand in
the frame, then a landmark model regresses the 21 keypoints. A lightweight gesture
classification head runs on top of the landmarks to produce the category and score.
All three stages are bundled in a single .task file and executed via
WebAssembly + WebGL, keeping latency below 15 ms on most devices.
Detection sensitivity can be tuned from the settings panel:
- Detection confidence — minimum score to accept a new hand detection.
- Tracking confidence — minimum score to keep tracking an existing hand across frames.
- Gesture confidence — minimum score for a gesture label to be displayed on the canvas.
- Max hands — upper bound on how many hands are tracked at once (1–4).
The data is exported as a CSV file structured as follows:
| frame | timestamp | tag | hand | side | gesture | confidence | lm0_x | lm0_y | lm0_z | … | lm20_x | lm20_y | lm20_z |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 12 | 1718000000 | 0 | 0 | Right | Thumb_Up | 0.9821 | 0.512 | 0.748 | -0.031 | … | 0.421 | 0.302 | -0.089 |
Landmark coordinates are in MediaPipe's normalised image space (x, y ∈ [0, 1], origin at top-left). The z value is depth relative to the wrist — negative values are closer to the camera. One row is written per detected hand per inference frame; frames where no hand is detected are omitted.
Reference: Google. MediaPipe Gesture Recognizer. developers.google.com/mediapipe/solutions/vision/gesture_recognizer