PoseNet


            Pose Estimation

RUN

The PoseNet tool detects key body points in human figures using the PoseNet model. The model runs with either a single-person or multi-person detection algorithm. The single-person detector is faster and simpler but requires only one person to be present on the screen, whereas the multi-person detector can detect many people, but it is slightly slower than the single-person algorithm.

The model outputs a confidence score and a series of key points for each person in a frame of a video recording. The confidence score represents the probability that a person has been correctly detected. It ranges from a value of 0 to 1, where 1 represents an exact detection. The key points represent body parts, each with an individual confidence score and a position. A key point’s confidence score represents the probability that the key point has been accurately detected. A key point’s position represents the coordinates of the point in the frame and is expressed as x and y values.

The following 17 key points are detected:

Nose
Left Eye
Right Eye
Left Ear
Right Ear
Left Shoulder
Right Shoulder
Left Elbow
Right Elbow
Left Wrist
Right Wrist
Left Hip
Right Hip
Left Knee
Right Knee
Left Ankle
Right Ankle

The data is exported as a CSV file structured as follows:

frame	person	score	nose_score	nose_x	nose_y	leftEye_score	leftEye_x	leftEye_y	leftAnkle_score	leftAnkle_x	leftAnkle_y	rightAnkle_score	rightAnkle_x	rightAnkle_y
0	0	0.288138187	0.984118104	276.5469314	207.6690355				0.349789828	92.15140262	407.0926641	0.36024788	436.4799229	412.5408542
0	1	0.284447396	0.985967278	275.9498693	207.3298892	0.993024826	0.284447396	0.985967278	0.33608526	91.10668598	408.2515874	0.270381719	439.463522	414.1420876
1	0	0.288234481	0.983611763	275.5707123	209.5347086	0.98942554	0.288234481	0.983611763	0.267585278	94.98451764	395.7671663	0.546998918	437.8591133	411.9962876

The frame column represents the frame number in the video. Each frame consists of a number of persons represented in the person column. Each person is represented with a confidence score and a series of key points. For example, person 1 in frame 0 has a confidence score of 0.28, a nose confidence score of 0.99, and nose coordinates of (276, 207). Some key point fields will not contain any values. This means that the body part was either not detected by the model or that it did not meet the confidence threshold. So, in frame 0 of the previous table, person 0’s left eye was not detected by the model.

You can change confidence levels using the controller, along with a number of other model parameters including:

Architecture - It determines which PoseNet architecture to load.
Output Stride - It specifies the output stride of the PoseNet model. The smaller the value, the larger the output resolution, and the more accurate the model at the cost of speed.
Input Resolution - It specifies the size the image is resized and padded to before it is fed into the PoseNet model. The larger the value, the more accurate the model at the cost of speed.

For more information on PoseNet, refer to this blog post.

Source Code

https://github.com/tensorflow/tfjs-models/tree/master/posenet