Construction of intelligent gymnastics teaching model based on neural network and artificial intelligence

Experimental environment and parameters setting

The experiment is conducted in a high-performance computing environment, and the specific experimental environment configuration is displayed in Table 1.

Table 1 Configuration of experimental environment.

The parameters of the model are configured as follows: The input layer of ANN model receives the 3D coordinate data of 20 bone points obtained by Kinect sensor, the first and second hidden layers are provided with 128 and 64 neurons respectively, and the output layer corresponds to 10 movement categories, using ReLU and Softmax activation functions. HMM is set to five hidden states, and three-dimensional coordinate information is used as observation value, and the parameters are optimized by Baum-Welch algorithm. In data processing, firstly, the collected bone data are normalized to eliminate the influence of body shape and range of movement. To enhance the impact of time series modeling, the sliding window approach is then used to extract the time series features. The window length is set to 10 frames, and the step size is set to 5 frames. Accuracy, recall, and F1-score are performance evaluation metrics that accurately depict the model’s categorization effect. To make sure the reaction time does not exceed beyond one second, the system’s real-time performance is also examined. According to the experimental data, the scheme that combines ANN and HMM has performed well in the movement recognition test, hence increasing the system’s accuracy and intelligence.

The University of Texas Kinect Action 3D (UTKinectAction3D) dataset is used in the experiment. This public dataset is designed for human action recognition research. It includes 10 action categories, such as waving, pushing, and kicking. The dataset contains 3D skeletal data collected by a Kinect sensor. Each sample records 20 joint coordinates in 3D space. Each action is performed multiple times by different individuals, with variations in style and speed. A total of 30 participants are involved, and each participant repeated each action 10 times. Data collection lasted for 3 months. After preprocessing, the data is split into training and testing sets at a 7:3 ratio. This ensures that the model is trained and evaluated on representative samples.

Performance evaluation

The evaluation indexes used in this study are accuracy, precision and recall^37,38. Given the test set $\:\left\{\left({x}^{\left(1\right)},{y}^{\left(1\right)}\right),\dots\:,\left({x}^{\left(N\right)},{y}^{\left(N\right)}\right)\right\}$, define the real tag $\:{y}^{\left(n\right)}\in\:\{1,\dots\:,C\}$, and use $\:f(x;\theta\:)$ to predict every sample in the test set, and the results are $\:\{{\widehat{y}}^{\left(1\right)},\dots\:,{\widehat{y}}^{\left(N\right)}\}$. For category c, the results of the model on the test set can be divided into True Positive (TP), False Positive (FP), True Negative (TN) and False Negative (FN)³⁹.

TP indicates that the real category of a sample is c, and the category correctly predicted by the model is c. The number of such samples is recorded as:

$$\:T{P}_{c}={\sum\:}_{n=1}^{N}I({y}^{\left(n\right)}={\widehat{y}}^{\left(n\right)}=c)$$

(9)

FP indicates that the real category of a sample is other categories, and the model incorrectly predicts it as category c. The number of such samples is recorded as:

$$\:F{P}_{c}={\sum\:}_{n=1}^{N}I({y}^{\left(n\right)}\ne\:c\wedge\:{\widehat{y}}^{\left(n\right)}=c)$$

(10)

TN indicates that the real category of a sample is other categories, and the model also predicts other categories. The number of such samples is recorded as. For category c, this situation generally does not require attention.

FN indicates that the real category of a sample is c, and the model incorrectly predicts other categories. The number of such samples is recorded as:

$$\:F{N}_{c}={\sum\:}_{n=1}^{N}I\left({y}^{\left(n\right)}=c\wedge\:{\widehat{y}}^{\left(n\right)}\ne\:c\right)$$

(11)

is indicator function.

Accuracy: The ratio of the number of correctly predicted samples to the total number of samples, and the calculation equation is as follows:

$$\:Accuracy=\frac{1}{N}\sum\:_{n=1}^{N}I({y}^{\left(n\right)}={\widehat{y}}^{\left(n\right)})=\frac{TP+TN}{TP+TN+FP+FN}$$

(12)

Precision: The precision of category $\:c$ is the correct proportion of all samples predicted as category $\:c$, indicating whether the prediction is accurate or not. There are two possibilities to predict category $\:c$. One is to predict category $\:c$ as category $\:c\left(T{P}_{c}\right)$, and the other is to predict other categories as category $\:c\left(F{P}_{c}\right)$. The precision is expressed by the following equation:

$$\:Precision=\frac{T{P}_{c}}{T{P}_{c}+{FP}_{c}}$$

(13)

Recall rate: The recall rate of category $\:c$ is the correct proportion of all samples with the real label of category $\:c$, indicating that the prediction is incomplete⁴⁰. However, there are two possibilities for the prediction results with the real label as category $\:c$. One is to predict the real label as category $\:c$ as category $\:c\left(F{P}_{c}\right)$, and the other is to predict the real label as category c as other categories $\:c\left(F{N}_{c}\right)$. The equation of the recall rate is:

$$\:Recall=\frac{F{P}_{c}}{T{P}_{c}+F{N}_{c}}$$

(14)

The F1-score is a crucial metric for assessing a classification model’s performance since it balances the model’s recall and precision.

$$\:\text{F1-score}=2\times\:\frac{\text{Precision}\times\:\text{Recall}}{\text{Precision}+\text{Recall}}$$

(15)

Accuracy analysis of movement recognition based on ANN

In this section, this study shows the experimental results of evaluating the performance of ANN architecture in movement recognition tasks. The effects of different model structures and data processing technologies on accuracy, recall and F1-score are analyzed. In this study, single-layer, double-layer and three-layer neural networks are analyzed respectively, and the input data include original data and normalized data. The performance analysis results of ANN model are displayed in Fig. 6.

Fig. 6

Performance analysis of ANN model.

Figure 6 shows that the original data model with 64 neurons in a single layer has an accuracy rate of 84.5%, a recall rate of 82.1% and a F1-score of 83.3%. After data normalization, the accuracy rate is improved to 87.6%, and the recall rate and F1-score are also increased accordingly. This trend is also significant in the two-layer and three-layer neuron models, especially in the normalized data model using three-layer [256, 128, 64] neurons, the accuracy rate reaches 95.0%, the recall rate is 93.1%, and the F1-score is 94.0%. This result shows that increasing the number of neurons and the effectiveness of data preprocessing are very important to improve the performance of movement recognition.

Influence of hmm’s number of hidden States on recognition performance

The impact of HMM’s hidden state count on movement detection performance is covered in this section. The results of movement recognition performance of HMM under different hidden states are shown in Fig. 7.

Fig. 7

Performance results of HMM’s movement recognition under different hidden states.

The data in Fig. 7 shows that when the number of hidden states is 5, the accuracy reaches 95.0%, and the response time is 6.4 s, the recall rate is 93.1%, and the F1-score is 94.0%. When the number of hidden states increases to 7, the accuracy is further improved to 96.3%, but the response time is also slightly increased to 6.8 s. The influence of normalization on the number of hidden states is also significant. The normalized model of seven hidden states achieves the highest accuracy of 97.5% and the best F1-score 96.8%%.

Comparison between ANN-HMM-GNN combination and other algorithms

In this section, the proposed ANN-HMM-GNN combination is compared with several other machine learning algorithms, including Transformer, GNN, Temporal Convolutional Network (TCN), Random Forest (RF), Support Vector Machine (SVM), CNN, and LSTM. The model performance comparison results are shown in Fig. 8.

Fig. 8

Model performance comparison results.

The ANN-HMM-GNN model performs the best in terms of accuracy, recall, and F1-score, achieving 98.2%, 97.5%, and 97.8%, respectively. This indicates that the model has high accuracy and robustness in gymnastics action recognition tasks. In comparison, the Transformer model shows an accuracy of 97.0%, recall of 96.3%, and F1-score of 96.6%, which is also quite impressive. The performance of GNN and TCN models is slightly lower than that of the ANN-HMM-GNN model but still outperforms other traditional algorithms. Traditional algorithms like SVM, RF, and CNN show slightly lower accuracy and recall, while the LSTM model, despite excelling in time series data processing, still lags the ANN-HMM-GNN model. Overall, the ANN-HMM-GNN model has a significant advantage in comprehensive performance and is more effective in capturing the spatiotemporal features of gymnastics actions, providing strong technical support for intelligent gymnastics teaching.

Analysis of recognition rate of different gymnastics movements

At the end of this section, this study shows the recognition rate of various gymnastics movements, including standing, sitting, stretching arms, turning, bending and jumping. The recognition rate analysis results of each different gymnastics movement are shown in Fig. 9.

Fig. 9

Analysis results of recognition rate of different gymnastics movements.

In this set of data, the recognition performance of each movement is good. The recognition rate of jumping movement is the highest, reaching 98.0%, which shows the superior recognition ability of the model to this dynamic movement. Stretching the arm also performs well, and the recognition rate is 97.1%, indicating that this movement is very effective in feature extraction. The recognition rates of sitting and standing movements are 95.3% and 96.5% respectively, showing good accuracy. In the aspect of rejection rate, the rejection rate of bending movement is high, which is 5.0%, which may be related to the fuzziness of its movement characteristics.

To further verify the classification performance of the model, this study generates a confusion matrix on the test set, as shown in Table 2. The confusion matrix can intuitively show the classification of the model in each action category, including the number of correctly classified and misclassified samples.

Table 2 Confusion matrix of model on test set (unit: number of samples).

The confusion matrix shows that the model performs well on most action categories, especially in the recognition accuracy of jumping and arm stretching actions. However, the misclassification rate of bending actions is relatively high, mainly concentrated in the confusion with sitting and turning actions. This may be because bending actions and other actions have partial overlap in spatial features, making it difficult for the model to fully distinguish them. In addition, there is a small amount of misclassification between standing and sitting actions, which may be due to the similar static characteristics of the actions. Through the qualitative analysis of the confusion matrix, the performance of the model can be more comprehensively evaluated, and directions for subsequent optimization can be provided. For example, for the recognition problem of bending actions, more refined feature extraction methods or increasing training data can be tried to improve the model’s discrimination ability.

Model complexity description

In this study, the complexity of the proposed ANN-HMM-GNN model and other common classifiers is analyzed in detail. The complexity is mainly reflected in the training time and reasoning time, and the results are shown in Table 3.

Table 3 Complexity comparison of different models.

Table 3 shows that the ANN-HMM-GNN model has the longest training time, approximately 12 h, because it requires training and optimizing three separate models. However, in terms of inference time, the ANN-HMM-GNN model performs excellently, with an average of 6.6 s per sample, indicating that it meets the real-time requirements for online applications. Additionally, the ANN-HMM-GNN model achieves an accuracy of 98.2%, significantly higher than other models, further proving its superiority in gymnastics action recognition tasks. In contrast, although the SVM and RF models have shorter training and inference times, their accuracy is lower, at 93.2% and 91.3%, respectively. The CNN and 1D-CNN models have relatively longer training times but moderate inference times, with higher accuracies of 94.7% and 94.2%. The LSTM model, while excelling in handling time series data, has longer training and inference times, with an accuracy of 95.4%. Overall, the ANN-HMM-GNN model strikes a good balance among training time, inference time, and accuracy, making it suitable for gymnastics action recognition tasks that require high accuracy and real-time feedback.

Gymnastics intelligent teaching model and analysis of environmental parameters

To evaluate the impact of environmental and physical parameters on the performance of the gymnastics intelligent teaching model, a detailed experimental analysis is conducted. The following key parameters are considered in the experiment: lighting conditions, differences in athlete height and weight, and types of sportswear. The detailed table of the experimental results is provided below.

Table 4 Influence of environmental parameters and physical parameters on model performance.

As shown in Table 4, the performance of the gymnastics intelligent teaching model varies under different environmental and physical parameters. In terms of lighting conditions, the model performs best under strong light, with accuracy, recall, and F1-score of 97.8%, 97.2%, and 97.5%, respectively. Performance slightly decreases under weak and variable lighting conditions. For athlete height and weight, the model performs best with athletes of medium height (160–180 cm) and medium weight (60–80 kg), achieving accuracy and F1-score of 97.5%. Regarding sportswear types, the model performs best with tight-fitting clothing, with accuracy and F1-score of 98.0% and 97.6%, respectively. These results indicate that, although the model shows slight performance differences under various conditions, its overall performance remains stable and excellent, demonstrating good robustness and adaptability to handle various real-world situations effectively.

Discussion

This study presents ANN-based intelligent gymnastics teaching model and verifies its effectiveness and potential in gymnastics action recognition through a series of experiments and analyses. By combining the feature extraction capability of ANN with the time-series modeling ability of HMM, this study explores how artificial intelligence can improve gymnastics teaching, enhancing both action recognition accuracy and system response efficiency. The proposed model, combining ANN and HMM, demonstrates significant advantages in gymnastics action recognition, particularly in terms of accuracy and F1-score, outperforming other traditional algorithms. Experimental results show that the model depth, data preprocessing, and the number of hidden states all have a significant impact on performance. In particular, deep networks and data normalization improve recognition accuracy. The high rejection rate of bending actions is mainly due to the ambiguity of their action features. In gymnastics training, the execution style and amplitude of bending actions vary between individuals, making feature extraction more challenging. Moreover, the Kinect device may have accuracy limitations when capturing bending actions, which can also affect the model’s recognition performance. Future research could explore the use of more advanced sensor technologies or optimize feature extraction methods to improve the recognition performance of bending actions.

Table 5 shows the performance comparison between ANN-HMM-GNN model and other related studies.

Table 5 Performance comparison between ANN-HMM-GNN model and other related studies.

The ANN-HMM-GNN model performs excellently in terms of accuracy, outperforming other studies. However, the initial investment and operational costs of the ANN-HMM-GNN model are relatively high, primarily due to the need for high-performance processors and Kinect devices. Despite the increased hardware and computational resource requirements, the model shows significant improvements in training results, learning efficiency, and user satisfaction compared to traditional methods. By optimizing the model structure and algorithms, it is possible to reduce computational costs while maintaining recognition performance. Overall, the ANN-HMM-GNN model offers significant advantages in improving training effectiveness and learning efficiency, providing an economical and efficient intelligent teaching solution for schools, training centers, and general users.

Source link