HandPose

Single Hand Keypoints Detector

RUN

This module detects key hand points in hand figures using HandPose. The model works by first predicting whether a frame of a video recording contains a hand. If so, the model outputs a confidence score and a series of key points for the hand. The confidence score represents the probability that the hand has been correctly detected. It ranges from a value of 0 to 1, where 1 represents an exact detection. There are a total of 21 key points, which represent the location of each finger joint and the palm. A key point’s position represents the coordinates of the point in a video frame and is expressed as x, y, and z values.

The data is exported as a CSV file structured as follows:

frame handInViewConfidence indexFinger_1_x indexFinger_1_y indexFinger_1_z indexFinger_2_x thumb_4_z palmBase_x palmBase_y palmBase_z
0 0.986333966 322.4174449 168.304389 -14.46016312 321.3518461 -32.55603409 287.714139 272.9273902 -0.00182613
1 0.986326456 322.4671148 166.8973452 -14.31898689 320.9870967 -33.10225296 287.6607924 271.9008103 -0.001728684
2 0.986034989 321.2713459 167.3608348 -13.81084251 320.5034114 -32.29040146 286.5937991 272.8722047 -0.001779698

The frame column represents the frame number in the video and the handInViewConfidence represents the detected hand’s confidence score in that frame. The rest of the columns are represented as [finger]_[joint]_[axis], except for the last three columns which are for the palm key points and are represented as palmBase_[axis]. So, for example, the indexFinger_1_x column represents the x coordinate of the first joint in the hand’s index finger. This is further illustrated in the figure below.

For more details on HandPose, refer to Google AI’s blogpost

Source Code