Holistic

Pose, Hand Pose, and Face Landmarks All in One

RUN

Holistic, by MediaPipe, provides live detection of human pose, face landmarks, and hand tracking, all in one model.

The model outputs 468 normalized face-landmark coordinates (if detected) 20 normalized hand coordinates for each hand (if detected) and 14 normalized pose coordinates (if detected) with visibility scores indicating how confident the model is they appear in the frame.

The tool tracks the following 31 coordinates

  1. right eye inner
  2. right eye outer
  3. left eye inner
  4. left eye outer
  5. lip right corner
  6. lip left corner
  7. right hand palm
  8. right thumb
  9. right index finger
  10. right middle finger
  11. right ring finger
  12. right pinky finger
  13. left thumb
  14. left index finger
  15. left middle finger
  16. left ring finger
  17. left pinky finger
  18. left shoulder
  19. right shoulder
  20. left elbow
  21. right elbow
  22. left hip
  23. right hip
  24. left knee
  25. right knee
  26. left ankle
  27. right ankle
  28. left heel
  29. right heel
  30. left foot index
  31. right foot index

The data is exported as a CSV file structured as follows:

Frame Timestamp right eye inner x pos right eye inner y pos right eye outer x pos right eye outer y pos left eye inner x pos left eye inner y pos left eye outer x pos left eye outer y pos lip right corner x pos lip right corner y pos lip left corner x pos lip left corner y pos right hand palm x pos right hand palm y pos right thumb x pos right thumb y pos right index finger x pos right index finger y pos right middle finger x pos right middle finger y pos right ring finger x pos right ring finger y pos right pinky finger x pos right pinky finger y pos left thumb x pos left thumb y pos left index finger x pos left index finger y pos left middle finger x pos left middle finger y pos left ring finger x pos left ring finger y pos left pinky finger x pos left pinky finger y pos left shoulder x pos left shoulder y pos right shoulder x pos right shoulder y pos left elbow x pos left elbow y pos right elbow x pos right elbow y pos left hip x pos left hip y pos right hip x pos right hip y pos left knee x pos left knee y pos right knee x pos right knee y pos left ankle x pos left ankle y pos right ankle x pos right ankle y pos left heel x pos left heel y pos right heel x pos right heel y pos left foot index x pos left foot index y pos right foot index x pos right foot index y pos
0 11 251.2331 239.0843 208.1723 238.9472 290.5730 240.7219 332.6023 242.9274 230.8592 318.2940 305.4467 320.0349 251.2331 239.0843 208.1723 238.9472 290.5730 240.7219 332.6023 242.9274 230.8592 318.2940 305.4467 238.9472 290.5730 240.7219 332.6023 242.9274 230.8592 318.2940 305.4467 251.2331 239.0843 208.1723 238.9472 290.5730 240.7219 332.6023 242.9274 230.8592 318.2940 305.4467 320.0349 251.2331 239.0843 208.1723 238.9472 290.5730 240.7219 332.6023 242.9274 230.8592 318.2940 305.4467 238.9472 290.5730 240.7219 332.6023 242.9274 230.8592 318.2940 305.4467
1 51 251.2331 239.0843 208.1723 238.9472 290.5730 240.7219 332.6023 242.9274 230.8592 318.2940 305.4467 320.0349 251.2331 239.0843 208.1723 238.9472 290.5730 240.7219 332.6023 242.9274 230.8592 318.2940 305.4467 238.9472 290.5730 240.7219 332.6023 242.9274 230.8592 318.2940 305.4467 251.2331 239.0843 208.1723 238.9472 290.5730 240.7219 332.6023 242.9274 230.8592 318.2940 305.4467 320.0349 251.2331 239.0843 208.1723 238.9472 290.5730 240.7219 332.6023 242.9274 230.8592 318.2940 305.4467 238.9472 290.5730 240.7219 332.6023 242.9274 230.8592 318.2940 305.4467

The frame column represents the frame number in the video, the timestamp column represents the time in milliseconds that have transpired since the model started running (upon recording data), and the other columns give the coordinates (x, y) of the detected features.

  1. model_complexity - Complexity of the pose landmark model: 0, 1 or 2. Landmark accuracy as well as inference latency generally go up with the model complexity.
  2. smooth_landmarks - If set to true, the solution filters pose landmarks across different input images to reduce jitter
  3. enable_segmentation - If set to true, in addition to the pose, face and hand landmarks the solution also generates the segmentation mask. Default to false.
  4. smooth_segmentation - If set to true, the solution filters segmentation masks across different input images to reduce jitter.
  5. refineLandMarks - whether or not to track the pupils
  6. minDetectionConfidence - Minimum confidence value ([0.0, 1.0]) from the face detection model for the detection to be considered successful. Default to 0.5.
  7. minTrackingConfidence - Minimum confidence value ([0.0, 1.0]) from the landmark-tracking model for the face landmarks to be considered tracked successfully, or otherwise face detection will be invoked automatically on the next input image. Setting it to a higher value can increase robustness of the solution, at the expense of a higher latency.

Source Code