Gesture Recognizer

Real-Time Multi-Hand Gesture Recognition

RUN

The Gesture Recognizer detects and classifies hand gestures in real time using MediaPipe Tasks Vision (v0.10.3), running entirely in the browser — no data leaves your device. Up to four hands are tracked simultaneously, each returning 21 normalised 2D landmarks and a gesture class with confidence score.

Eight gestures are recognised out of the box:

  1. None —
  2. Closed Fist ✊
  3. Open Palm ✋
  4. Pointing Up ☝️
  5. Thumb Down 👎
  6. Thumb Up 👍
  7. Victory ✌️
  8. I Love You 🤟

The underlying model is a two-stage pipeline: a hand detector localises each hand in the frame, then a landmark model regresses the 21 keypoints. A lightweight gesture classification head runs on top of the landmarks to produce the category and score. All three stages are bundled in a single .task file and executed via WebAssembly + WebGL, keeping latency below 15 ms on most devices.

Detection sensitivity can be tuned from the settings panel:

The data is exported as a CSV file structured as follows:

frametimestamptaghandside gestureconfidence lm0_xlm0_ylm0_z lm20_xlm20_ylm20_z
12171800000000Right Thumb_Up0.9821 0.5120.748-0.031 0.4210.302-0.089

Landmark coordinates are in MediaPipe's normalised image space (x, y ∈ [0, 1], origin at top-left). The z value is depth relative to the wrist — negative values are closer to the camera. One row is written per detected hand per inference frame; frames where no hand is detected are omitted.

Reference: Google. MediaPipe Gesture Recognizer. developers.google.com/mediapipe/solutions/vision/gesture_recognizer

Source Code