PoseNet

Pose Estimation

RUN

The PoseNet tool detects key body points in human figures using the PoseNet model. The model runs with either a single-person or multi-person detection algorithm. The single-person detector is faster and simpler but requires only one person to be present on the screen, whereas the multi-person detector can detect many people, but it is slightly slower than the single-person algorithm.

The model outputs a confidence score and a series of key points for each person in a frame of a video recording. The confidence score represents the probability that a person has been correctly detected. It ranges from a value of 0 to 1, where 1 represents an exact detection. The key points represent body parts, each with an individual confidence score and a position. A key point’s confidence score represents the probability that the key point has been accurately detected. A key point’s position represents the coordinates of the point in the frame and is expressed as x and y values.

The following 17 key points are detected:

  1. Nose
  2. Left Eye
  3. Right Eye
  4. Left Ear
  5. Right Ear
  6. Left Shoulder
  7. Right Shoulder
  8. Left Elbow
  9. Right Elbow
  10. Left Wrist
  11. Right Wrist
  12. Left Hip
  13. Right Hip
  14. Left Knee
  15. Right Knee
  16. Left Ankle
  17. Right Ankle

The data is exported as a CSV file structured as follows:

frame person score nose_score nose_x nose_y leftEye_score leftEye_x leftEye_y leftAnkle_score leftAnkle_x leftAnkle_y rightAnkle_score rightAnkle_x rightAnkle_y
0 0 0.288138187 0.984118104 276.5469314 207.6690355       0.349789828 92.15140262 407.0926641 0.36024788 436.4799229 412.5408542
0 1 0.284447396 0.985967278 275.9498693 207.3298892 0.993024826 0.284447396 0.985967278 0.33608526 91.10668598 408.2515874 0.270381719 439.463522 414.1420876
1 0 0.288234481 0.983611763 275.5707123 209.5347086 0.98942554 0.288234481 0.983611763 0.267585278 94.98451764 395.7671663 0.546998918 437.8591133 411.9962876

The frame column represents the frame number in the video. Each frame consists of a number of persons represented in the person column. Each person is represented with a confidence score and a series of key points. For example, person 1 in frame 0 has a confidence score of 0.28, a nose confidence score of 0.99, and nose coordinates of (276, 207). Some key point fields will not contain any values. This means that the body part was either not detected by the model or that it did not meet the confidence threshold. So, in frame 0 of the previous table, person 0’s left eye was not detected by the model.

You can change confidence levels using the controller, along with a number of other model parameters including:

  1. Architecture - It determines which PoseNet architecture to load.
  2. Output Stride - It specifies the output stride of the PoseNet model. The smaller the value, the larger the output resolution, and the more accurate the model at the cost of speed.
  3. Input Resolution - It specifies the size the image is resized and padded to before it is fed into the PoseNet model. The larger the value, the more accurate the model at the cost of speed.

For more information on PoseNet, refer to this blog post.

Source Code