Multi-HandPose

Hand Keypoints Detector

RUN

This module detects key body points in hand figures using both the Handtrack.js and HandPose models. This module was created to overcome the limitations presented by the two models: Handtrack.js can only detect bounding boxes and not the actual finger joints and palm keypoints, whereas HandPose can only detect keypoints of a single hand at a time.

This module combines both Handtrack.js and HandPose to detect the key hand joints of multiple hands in a single frame. The model first applies Handtrack.js to a frame of a video recording to predict the bounding boxes of each hand. These bounding boxes are then individually supplied to the HandPose model, which detects whether a hand is actually present within the bounding box and, if so, predicts the key hand joints. The model outputs a confidence score and a series of key points for each detected hand. The confidence score represents the probability that the hand has been correctly detected. It ranges from a value of 0 to 1, where 1 represents an exact detection. There are a total of 21 key points, which represent the location of each finger joint and the palm. A key point’s position represents the coordinates of the point in a video frame and is expressed as x, y, and z values.

The data is exported as a CSV file structured as follows:

frame handInViewConfidence indexFinger_1_x indexFinger_1_y indexFinger_1_z indexFinger_2_x thumb_4_z palmBase_x palmBase_y palmBase_z
0 0.986333966 322.4174449 168.304389 -14.46016312 321.3518461 -32.55603409 287.714139 272.9273902 -0.00182613
1 0.986326456 322.4671148 166.8973452 -14.31898689 320.9870967 -33.10225296 287.6607924 271.9008103 -0.001728684
2 0.986034989 321.2713459 167.3608348 -13.81084251 320.5034114 -32.29040146 286.5937991 272.8722047 -0.001779698

The frame column represents the frame number in the video and the handInViewConfidence represents the detected hand’s confidence score in that frame. The rest of the columns are represented as [finger]_[joint]_[axis], except for the last three columns which are for the palm key points and are represented as palmBase_[axis]. So, for example, the indexFinger_1_x column represents the x coordinate of the first joint in the hand’s index finger. This is further illustrated in the figure below.

You can change confidence levels using the controller, along with a number of other model parameters including:

The HandTrack.js model often produces bounding boxes that are too tight for the HandPose model to be able to detect a hand within them. Therefore, by default, we scale the bounding boxes by a factor of 2. We have also provided the option to scale the width and height of the bounding boxes according to the user’s preference. There are also other variables that can be changed, including:

  1. Score Threshold - It determines which PoseNet architecture to load.
  2. Maximum number of boxes - It specifies the maximum number of boxes that the HandTrack.js model should detect.
  3. IoU Threshold - It clusters the bounding boxes by spatial closeness measured with IoU and keeps only the most confident bounding box among each cluster.

Source Code