Simplifying the palm into 21 joints essentially transforms the palm into a point cloud model. The common practice is to distribute each joint into a spatial coordinate system, and use combinatorial mathematics theory and social network graph theory for recognition32. When the 21 coordinate points change frequently, the aggregation pattern of these points becomes extremely complex, significantly increasing the computational load for dynamic gesture recognition. To avoid traversing each coordinate point, this paper proposes a fast algorithm that reduces computational complexity while ensuring gesture recognition.
Hand joint recognition method
The hand key points recognition is to locate the positions and postures of hand joints. By accurately identifying these key points, gesture changes can be located and tracked. Common methods include graph model-based methods, template matching-based methods, and feature point detection-based methods. In recent years, with the optimization and upgrading of the OpenCV library, the accuracy and efficiency of deep learning-based hand joint recognition functions have significantly improved, with the average recognition time for hand joints less than 1 ms, providing technical support for HCI control. This paper uses the “MediaPipe.hands” gesture detection module in OpenCV, an open-source function model provided by Google Company, which has been well trained and can recognize finger posture changes in real-time with good robustness.
“MediaPipe.hands” gesture detection model numbers each joint, as shown in Fig. 3 below. Point 0 represents the center of the wrist. When each finger changes its movement, the gesture can be preliminarily classified through the set of joints. Therefore, we need to use a classifier algorithm to implement this function.
Recognition of hand joints.
Classifier recognition algorithm
Currently, the main classifier recognition algorithms include logistic regression, naive bayes, K-nearest neighbors, decision trees, and support vector machines (SVM). The classification features of discrete points of hand joints on a two-dimensional coordinate plane are two-dimensional linear classification, and SVM is particularly suitable for binary linear classification of discrete targets.
SVM algorithm is also applicable to machine learning problems in small sample scenarios. It can simplify common problems such as classification and regression. Specifically, when discrete points are linearly separable, the algorithm can find the optimal classification hyperplane for two types of samples in the original space. The classifier function \(g(x)\) is expressed as follows.
In Eq. (1), \(a\) is the slope, \(b\) is the intercept.
For typical binary discrete points, if function \(g_{k} (x) = a_{k} x + b_{k}\)\((k = 0,1,2,3)\) is the diagonal line in Fig. 4, red and blue points will be strictly classified according to this diagonal line. Based on the same principle, when making a fist, points below the diagonal line are classified into one group, and there are no discrete points above the diagonal line, as shown in Fig. 4(1). When the index finger is extended, this function distributes three points of the index finger (red) above the diagonal line, and the remaining points (blue) below the diagonal line, as shown in Fig. 4(2). According to the same principle, we can distinguish between extending two fingers and extending three fingers, as shown in Fig. 4(3) and Fig. 4(4).
Gesture recognition through classifier.
As the hand moves and gestures change, the coordinates of the 21 joints on each hand are in a dynamic process of change. Therefore, dynamically determining \(a\) and \(b\) is another key, so that function \(g(x) = ax + b\) becomes available.
According to the principle of least squares, the smaller the deviation between samples and expectations is, the better, that is, to minimize the weighted sum of squares of the deviation between observed values and expected values. For linear fitting of equally precise observed values like fingertip points in this article, we can minimize the following equation.
$$\sum\limits_{{i = {1}}}^{N} {\left[ {y_{i} – (ax_{i} + b)} \right]^{2} }$$
(2)
Taking partial derivatives of \(a\) and \(b\) in Eq. (2) above, we obtain the following equations.
$$\frac{\partial }{\partial a}\sum\limits_{{i = {1}}}^{N} {\left[ {y_{i} – (a + bx_{i} )} \right]^{2} } = – 2\sum\limits_{{i = {1}}}^{N} {(y_{i} – } a – bx_{i} ) = 0$$
(3)
$$\frac{\partial }{\partial b}\sum\limits_{{i = {1}}}^{N} {\left[ {y_{i} – (a + bx_{i} )} \right]^{2} } = – 2\sum\limits_{{i = {1}}}^{N} {\left[ {y_{i} – (a + bx_{i} )} \right]} x_{i} = 0$$
(4)
After calibration, the following system of equations can be obtained.
$$\left\{ {\begin{array}{*{20}c} {aN + b\sum\limits_{{i = {1}}}^{N} {x_{i} } = \sum\limits_{{i = {1}}}^{N} {y_{i} } \begin{array}{*{20}c} {} & {} \\ \end{array} } \\ {a\sum\limits_{{i = {1}}}^{N} {x_{i} } + b\sum\limits_{{i = {1}}}^{N} {x_{i}^{2} } = \sum\limits_{{i = {1}}}^{N} {x_{i} y_{i} } } \\ \end{array} } \right.$$
(5)
After solving the above system of equations, the optimal estimates of \(a\) and \(b\) can be obtained in \(g(x)\), with the solution formula as follows.
$$\hat{a}{ = }\frac{{\left( {\sum\limits_{{i = {1}}}^{N} {x_{i}^{2} } } \right)\left( {\sum\limits_{{i = {1}}}^{N} {y_{i} } } \right) – \left( {\sum\limits_{{i = {1}}}^{N} {x_{i} } } \right)\left( {\sum\limits_{{i = {1}}}^{N} {x_{i} y_{i} } } \right)}}{{m\left( {\sum\limits_{{i = {1}}}^{N} {x_{i}^{2} } } \right) – \left( {\sum\limits_{{i = {1}}}^{N} {x_{i} } } \right)^{2} }}$$
(6)
$$\hat{b}{ = }\frac{{m\left( {\sum\limits_{{i = {1}}}^{N} {x_{i} y_{i} } } \right) – \left( {\sum\limits_{{i = {1}}}^{N} {x_{i} } } \right)\left( {\sum\limits_{{i = {1}}}^{N} {y_{i} } } \right)}}{{m\left( {\sum\limits_{{i = {1}}}^{N} {x_{i}^{2} } } \right) – \left( {\sum\limits_{{i = {1}}}^{N} {x_{i} } } \right)^{2} }}$$
(7)
According to the above equation, to determine whether \(\hat{a}\) and \(\hat{b}\) have a linear relationship with \(x_{i}\) and \(y_{i}\), correlation analysis needs to be performed, and the correlation coefficient is set as \(r\). The expression for correlation calculation is as follows:
$$r = \frac{{\sum\limits_{{i = {1}}}^{N} {\left( {x_{i} – \overline{x}} \right)\sum\limits_{{i = {1}}}^{N} {\left( {y_{i} – \overline{y}} \right)} } }}{{\sqrt {\sum\limits_{{i = {1}}}^{N} {\left( {x_{i} – \overline{x}} \right)}^{2} } \sqrt {\sum\limits_{{i = {1}}}^{N} {\left( {y_{i} – \overline{y}} \right)}^{2} } }}$$
(8)
In Eq. (8), \(\overline{x} = \sum\limits_{{i = {1}}}^{N} {x_{i} } /m\), \(\overline{y} = \sum\limits_{{i = {1}}}^{N} {y_{i} } /m\), \(r \in [ – 1,1]\), when \(\left| r \right| \to 1\), an optimal linear relationship is observed between \(a\) and \(b\).
Left hand gesture recognition
The above classifier algorithm solves the problem of classifying finger joints and other palm joints. To further determine which finger is extended, the paper uses relative distance for judgment. Typically, measuring the similarity between two points is the main task of feature recognition. Common measurement methods include ‘Euclidean distance’ and ‘Mahalanobis distance’. Among them, ‘Euclidean distance’ is the most direct distance expression, suitable for distance measurement between two points33. Since ‘Euclidean distance’ does not focus on the relationship between each vector and each dimension, and the importance of each dimension component is the same. This is quite consistent with the characteristics of hand joint distance, which makes ‘Euclidean distance’ transformation widely used in digital image processing, especially suitable for measuring skeletal joints in human body images.
If each \(m\)-dimensional eigenvector of \(i\) is \(X_{i}\), and \(X_{ik}\) is the \(k\)-dimensional parameter of \(X_{i}\),\(\left| {D_{i,j} } \right|\) is the ‘Euclidean distance’ between individual \(i\) and individual \(j\), then \(\left| {D_{i,j} } \right|\) can be expressed as:
$$\left| {D_{i,j} } \right| = \sqrt {\sum\limits_{k = 1}^{m} {(X_{ik} – X_{jk} )^{2} } } (i,j = 1,2,3…n:\;k = 1,2,3..m)$$
(9)
As the left hand is used to select and determine the controlled parts with the spraying robot, this paper adopts the distance from the left finger joints to the center point 0 of the wrist to make judgments. The 21 hand joints are projected onto a two-dimensional coordinate plane, and the position changes of a joint are expressed with x-axis and y-axis coordinate values. The left hand’s posture can be determined through the following algorithm.
-
(1)
Extend the index finger of the left hand
$$d_{i} = \sqrt {(p_{ix} – p_{0x} )^{2} + (p_{iy} – p_{0y} )^{2} } (i = 1,2 \cdot \cdot \cdot 20)$$
(10)
\(d_{i}\) is the distance from any joint on the finger to the wrist point 0.
\(p_{ix}\) and \(p_{{i{\text{y}}}}\) are the coordinates of the ith joint on the x-axis and y-axis respectively.
\(p_{0x}\) and \(p_{{{\text{0y}}}}\) are the coordinates of the wrist at point 0 on the x-axis and y-axis respectively.
When \(d_{8}\) is greater than the distance from all other joints to point 0, the left hand is determined to be in the index finger pointing gesture, as shown in Fig. 4(2).
-
(2)
Extend the index and middle fingers of the left hand
$$\begin{array}{*{20}l} \begin{aligned} d_{j,k} = & \sqrt {(p_{jx} – p_{0x} )^{2} + (p_{jy} – p_{0y} )^{2} } \\ & + \sqrt {(p_{kx} – p_{0x} )^{2} + (p_{ky} – p_{0y} )^{2} } \\ \end{aligned} \hfill & \begin{gathered} (j = 1,2 \cdot \cdot \cdot 20) \hfill \\ (k = 1,2 \cdot \cdot \cdot 20) \hfill \\ \left( {j \ne k} \right) \hfill \\ \end{gathered} \hfill \\ \end{array}$$
(11)
\(p_{jx}\) and \(p_{jy}\) are the coordinates of the jth joint on the x-axis and y-axis respectively.
\(p_{kx}\) and \(p_{ky}\) are the coordinates of the kth joint on the x-axis and y-axis respectively.
In Eq. (11),when \(j = 8,k = 12\) and \(d_{8,12}\) is greater than the sum of distances from all other joints to point 0, the left hand is determined to be stretching the index finger and middle finger, as shown in Fig. 4(3).
-
(3)
Extend three fingers with the left hand
$$\begin{array}{*{20}l} \begin{aligned} d_{l,m,n} = & \sqrt {(p_{qx} – p_{0x} )^{2} + (p_{qy} – p_{0y} )^{2} } \\ & + \sqrt {(p_{mx} – p_{0x} )^{2} + (p_{my} – p_{0y} )^{2} } \\ & + \sqrt {(p_{nx} – p_{0x} )^{2} + (p_{ny} – p_{0y} )^{2} } \\ \end{aligned} \hfill & \begin{gathered} (q = 1,2 \cdot \cdot \cdot 20) \hfill \\ (m = 1,2 \cdot \cdot \cdot 20) \hfill \\ (n = 1,2 \cdot \cdot \cdot 20) \hfill \\ (q \ne m \ne n) \hfill \\ \end{gathered} \hfill \\ \end{array}$$
(12)
\(p_{qx}\) and \(p_{qy}\) are the coordinates of the qth joint on the x-axis and y-axis respectively.
\(p_{mx}\) and \(p_{my}\) are the coordinates of the mth joint on the x-axis and y-axis respectively.
\(p_{nx}\) and \(p_{ny}\) are the coordinates of the nth joint on the x-axis and y-axis respectively.
In Eq. (12), when \(q = 12,m = 16,n = 20\) and \(d_{q,m,n}\) is greater than the sum of distances from all other joints to point 0, the left hand with three fingers is identified as shown in Fig. 4(4).
Although random or unconscious gestures may get the same recognition results, for accurate and expected outcomes, operators are required to perform correct gestures as much as possible to achieve optimal recognition results.
Right-hand movement algorithm
The extraction method for the right index finger joint is similar to that of the left hand. However, since the right hand is needed to drive the robot’s movement, we only need to consider the right index finger’s tip point 8 separately. By identifying the movement coordinates of point 8, we can obtain the control commands.
If the coordinate of the right index finger’s tip point 8 is \((x_{i} ,y_{i} )\) at the initial moment, as the right index finger moves, the coordinate of point 8 changes to \((x_{j} ,y_{j} )\). Here, we classify the movement of the right index finger into two motion patterns:
-
(1)
Rotation of the right index finger.
Assume the oscillation trajectory of point 8 is approximately an arc, as shown by the dashed line in Fig. 5. However, as the finger rotates, the wrist also moves slightly, meaning point \(O^{\prime}\) of the wrist also changes in the XY plane. To solve this problem, we use absolute coordinate point \(O\) to calculate the rotation angle of hand point 8, as shown in Eq. (13).
$$\alpha_{j} = \left| {\arctan (y_{j} /x_{j} ) – \arctan (y_{i} /x_{i} )} \right|$$
(13)
Schematic diagram of right index finger motion trajectory calculation.
-
(2)
Horizontal movement of right index finger.
When the right index finger moves in translation, the coordinate value \((x_{j} ,y_{j} )\) continuously updates. We only need to calculate the distance between point \((x_{i} ,y_{i} )\) and point \((x_{j} ,y_{j} )\) to determine the distance the controlled object needs to move.






