PoseNet
Pose Estimation
RUN
The PoseNet tool detects key body points in human figures using the PoseNet model. The model runs with either a single-person or multi-person detection algorithm. The single-person detector is faster and simpler but requires only one person to be present on the screen, whereas the multi-person detector can detect many people, but it is slightly slower than the single-person algorithm.
The model outputs a confidence score and a series of key points for each person in a frame of a video recording. The confidence score represents the probability that a person has been correctly detected. It ranges from a value of 0 to 1, where 1 represents an exact detection. The key points represent body parts, each with an individual confidence score and a position. A key point’s confidence score represents the probability that the key point has been accurately detected. A key point’s position represents the coordinates of the point in the frame and is expressed as x and y values.
The following 17 key points are detected:
- Nose
- Left Eye
- Right Eye
- Left Ear
- Right Ear
- Left Shoulder
- Right Shoulder
- Left Elbow
- Right Elbow
- Left Wrist
- Right Wrist
- Left Hip
- Right Hip
- Left Knee
- Right Knee
- Left Ankle
- Right Ankle
The data is exported as a CSV file structured as follows:
frame | person | score | nose_score | nose_x | nose_y | leftEye_score | leftEye_x | leftEye_y | leftAnkle_score | leftAnkle_x | leftAnkle_y | rightAnkle_score | rightAnkle_x | rightAnkle_y |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 0.288138187 | 0.984118104 | 276.5469314 | 207.6690355 | 0.349789828 | 92.15140262 | 407.0926641 | 0.36024788 | 436.4799229 | 412.5408542 | |||
0 | 1 | 0.284447396 | 0.985967278 | 275.9498693 | 207.3298892 | 0.993024826 | 0.284447396 | 0.985967278 | 0.33608526 | 91.10668598 | 408.2515874 | 0.270381719 | 439.463522 | 414.1420876 |
1 | 0 | 0.288234481 | 0.983611763 | 275.5707123 | 209.5347086 | 0.98942554 | 0.288234481 | 0.983611763 | 0.267585278 | 94.98451764 | 395.7671663 | 0.546998918 | 437.8591133 | 411.9962876 |
The frame column represents the frame number in the video. Each frame consists of a number of persons represented in the person column. Each person is represented with a confidence score and a series of key points. For example, person 1 in frame 0 has a confidence score of 0.28, a nose confidence score of 0.99, and nose coordinates of (276, 207). Some key point fields will not contain any values. This means that the body part was either not detected by the model or that it did not meet the confidence threshold. So, in frame 0 of the previous table, person 0’s left eye was not detected by the model.
You can change confidence levels using the controller, along with a number of other model parameters including:
- Architecture - It determines which PoseNet architecture to load.
- Output Stride - It specifies the output stride of the PoseNet model. The smaller the value, the larger the output resolution, and the more accurate the model at the cost of speed.
- Input Resolution - It specifies the size the image is resized and padded to before it is fed into the PoseNet model. The larger the value, the more accurate the model at the cost of speed.
For more information on PoseNet, refer to this blog post.