MS-G3D

Real-Time Single Hand Gesture Recognition

RUN

MS-G3D is a real-time hand gesture recognition tool powered by the Multi-Scale Graph 3D Convolutional Network architecture, running entirely in your browser with no data leaving your device. It uses MediaPipe Hands to extract 21 3D landmarks per hand, then feeds a temporal sequence of those landmarks through the MS-G3D pipeline implemented in TensorFlow.js.

The MS-G3D architecture (Liu et al., CVPR 2020) combines two graph convolution branches:

  1. Multi-Scale Graph Convolution (MS-GCN): aggregates spatial features from k-hop neighbourhoods on the hand skeleton graph (scales k = 1, 2, 3), letting the network capture both local finger articulations and global hand shape simultaneously.
  2. Spatio-Temporal G3D: unfolds a sliding temporal window across the skeleton sequence and applies a joint spatial–temporal convolution, capturing how joints move in relation to each other over time.
The two branches are summed after each block, followed by global average pooling and a linear classification head.

The tool ships with a geometry-based classifier that works immediately without any training. A built-in custom gesture training mode lets you record your own gesture samples and fine-tune the classification head in-browser (≈2 seconds on CPU).

The following 8 gestures are recognised out of the box:

  1. Fist ✊
  2. Open Hand ✋
  3. Point ☝️
  4. Peace ✌️
  5. Thumbs Up 👍
  6. Rock 🤘
  7. Call Me 🤙
  8. OK 👌

The data is exported as a CSV file structured as follows:

frametimestamptaghand gestureconfidence joint_0_xjoint_0_yjoint_0_z joint_20_xjoint_20_yjoint_20_z
421718000000baselineRight Peace0.913 0.5120.748-0.031 0.4210.302-0.089

Landmark coordinates are in MediaPipe's normalised image space (x, y ∈ [0, 1]). The z value is depth relative to the wrist (negative = closer to camera). Rows where no hand is detected are omitted.

Reference: Liu Z., Zhang H., Chen Z., Wang Z., Ouyang W. Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition. CVPR 2020. arXiv:2003.14111

Source Code