Holistic


            Pose, Hand Pose, and Face Landmarks All in One

RUN

Holistic, by MediaPipe, provides live detection of human pose, face landmarks, and hand tracking, all in one model.

The model outputs 468 normalized face-landmark coordinates (if detected) 20 normalized hand coordinates for each hand (if detected) and 14 normalized pose coordinates (if detected) with visibility scores indicating how confident the model is they appear in the frame.

The tool tracks the following 31 coordinates

right eye inner
right eye outer
left eye inner
left eye outer
lip right corner
lip left corner
right hand palm
right thumb
right index finger
right middle finger
right ring finger
right pinky finger
left thumb
left index finger
left middle finger
left ring finger
left pinky finger
left shoulder
right shoulder
left elbow
right elbow
left hip
right hip
left knee
right knee
left ankle
right ankle
left heel
right heel
left foot index
right foot index

The data is exported as a CSV file structured as follows:

Frame	Timestamp	right eye inner x pos	right eye inner y pos	right eye outer x pos	right eye outer y pos	left eye inner x pos	left eye inner y pos	left eye outer x pos	left eye outer y pos	lip right corner x pos	lip right corner y pos	lip left corner x pos	lip left corner y pos	right hand palm x pos	right hand palm y pos	right thumb x pos	right thumb y pos	right index finger x pos	right index finger y pos	right middle finger x pos	right middle finger y pos	right ring finger x pos	right ring finger y pos	right pinky finger x pos	right pinky finger y pos	left thumb x pos	left thumb y pos	left index finger x pos	left index finger y pos	left middle finger x pos	left middle finger y pos	left ring finger x pos	left ring finger y pos	left pinky finger x pos	left pinky finger y pos	left shoulder x pos	left shoulder y pos	right shoulder x pos	right shoulder y pos	left elbow x pos	left elbow y pos	right elbow x pos	right elbow y pos	left hip x pos	left hip y pos	right hip x pos	right hip y pos	left knee x pos	left knee y pos	right knee x pos	right knee y pos	left ankle x pos	left ankle y pos	right ankle x pos	right ankle y pos	left heel x pos	left heel y pos	right heel x pos	right heel y pos	left foot index x pos	left foot index y pos	right foot index x pos	right foot index y pos
0	11	251.2331	239.0843	208.1723	238.9472	290.5730	240.7219	332.6023	242.9274	230.8592	318.2940	305.4467	320.0349	251.2331	239.0843	208.1723	238.9472	290.5730	240.7219	332.6023	242.9274	230.8592	318.2940	305.4467	238.9472	290.5730	240.7219	332.6023	242.9274	230.8592	318.2940	305.4467	251.2331	239.0843	208.1723	238.9472	290.5730	240.7219	332.6023	242.9274	230.8592	318.2940	305.4467	320.0349	251.2331	239.0843	208.1723	238.9472	290.5730	240.7219	332.6023	242.9274	230.8592	318.2940	305.4467	238.9472	290.5730	240.7219	332.6023	242.9274	230.8592	318.2940	305.4467
1	51	251.2331	239.0843	208.1723	238.9472	290.5730	240.7219	332.6023	242.9274	230.8592	318.2940	305.4467	320.0349	251.2331	239.0843	208.1723	238.9472	290.5730	240.7219	332.6023	242.9274	230.8592	318.2940	305.4467	238.9472	290.5730	240.7219	332.6023	242.9274	230.8592	318.2940	305.4467	251.2331	239.0843	208.1723	238.9472	290.5730	240.7219	332.6023	242.9274	230.8592	318.2940	305.4467	320.0349	251.2331	239.0843	208.1723	238.9472	290.5730	240.7219	332.6023	242.9274	230.8592	318.2940	305.4467	238.9472	290.5730	240.7219	332.6023	242.9274	230.8592	318.2940	305.4467

The frame column represents the frame number in the video, the timestamp column represents the time in milliseconds that have transpired since the model started running (upon recording data), and the other columns give the coordinates (x, y) of the detected features.

model_complexity - Complexity of the pose landmark model: 0, 1 or 2. Landmark accuracy as well as inference latency generally go up with the model complexity.
smooth_landmarks - If set to true, the solution filters pose landmarks across different input images to reduce jitter
enable_segmentation - If set to true, in addition to the pose, face and hand landmarks the solution also generates the segmentation mask. Default to false.
smooth_segmentation - If set to true, the solution filters segmentation masks across different input images to reduce jitter.
refineLandMarks - whether or not to track the pupils
minDetectionConfidence - Minimum confidence value ([0.0, 1.0]) from the face detection model for the detection to be considered successful. Default to 0.5.
minTrackingConfidence - Minimum confidence value ([0.0, 1.0]) from the landmark-tracking model for the face landmarks to be considered tracked successfully, or otherwise face detection will be invoked automatically on the next input image. Setting it to a higher value can increase robustness of the solution, at the expense of a higher latency.

Source Code

https://google.github.io/mediapipe/solutions/holistic.html