EZ-MMLA Toolkit: Sharingan

Loading Sharingan…

FPS —

Infer —

Faces —

Samples 0

Sharingan uses a ViT-12 + conditional DPT decoder to predict where each person is looking. Up to 3 people share a single forward pass. Requires WebGPU.

Loading model...

Overview

Total time:

Total frames: —

Samples saved: —

Output columns

Column	Description
`frame`	Frame index (0-based)
`timestamp`	Time in seconds
`face_index`	Person index in frame (0–2)
`face_x1/y1/x2/y2`	Face bounding box (px)
`gaze_x / gaze_y`	Predicted gaze target (px)
`inout_prob`	In-frame gaze probability (0–1)
Gaze vector
`gaze_vx / gaze_vy`	Unit direction vector (face center → gaze target)
`gaze_angle_deg`	Gaze angle in degrees (atan2, 0° = right)
`gaze_dist_px`	Pixel distance: face center → gaze target
Heatmap metrics
`hm_peak`	Peak softmax probability (higher = more certain)
`hm_entropy`	Normalised entropy (0 = certain, 1 = uniform)
`hm_spread_px`	Weighted spatial std-dev of heatmap (px, RMS)
Joint Visual Attention (JVA)
`jva_partner_idx`	Face index of best JVA partner (−1 if alone)
`jva_gaze_dist_px`	Pixel distance between the two gaze targets
`jva_dir_sim`	Cosine similarity of gaze direction vectors (−1 … +1)
`jva_hm_overlap`	Bhattacharyya coefficient of heatmaps (0–1)
`jva_score`	Combined JVA score: proximity + direction + overlap (0–1)
`tag`	User-defined label from the recording bar

Sharingan

Settings

Overview

Output columns

Camera not detected