
In recent years, several versions of YOLO, such as YOLOv3 and YOLOv4, have made significant progress in facial expression recognition. These versions have improved the accuracy and robustness of recognition by introducing new network architectures and training techniques such as batch normalization and residual networks. Research has shown that these improved versions of YOLO exhibit good recognition performance under various conditions, such as different lighting conditions and facial images from different angles.
The work done in this blog is to build a facial expression recognition system based on the YOLOv8 algorithm, present the system interface effect, explain the algorithm principle in depth, provide code implementation, and share the development process of the system. I hope the sharing on this blog can provide readers with some inspiration and promote more related research. The main contributions of this article are as follows:
1 Adopting the most advanced YOLOv8 algorithm for facial expression recognition, and comparing the results of YOLOv7, YOLOv6, YOLOv5 and other algorithms: Although there have been multiple studies on facial expression detection, these studies mainly use early deep learning models such as CNN and ResNet. Compared with these models, YOLOv8 algorithm exhibits higher efficiency and accuracy in multiple aspects. In this article, we will provide a detailed introduction on how to use this latest object detection algorithm for facial expression detection, providing researchers and practitioners in related fields with new research ideas and practical methods.
2 Implementing a facial expression recognition system using PySide6: In this article, we will explore how to develop a user-friendly facial expression recognition system using Python’s PySide6 library. This system enables users to perform facial expression detection in a more intuitive and convenient way, which helps promote the application of YOLOv8 algorithm and drives the practical application of facial expression detection technology.
3 The system provides SQLite-based registration and login management functions. Users need to register through the registration interface when they use it for the first time. After entering the username and password, the system will store this information in the SQLite database. After successful registration, users can enter the username and password through the login interface to log in. This design can ensure the security of the system and also provide the possibility to add more personalized functions in the future.
4 In depth research on the YOLOv8 model: While using the YOLOv8 algorithm for facial expression detection and recognition, I will conduct a detailed study on the performance of the algorithm, including evaluation of key indicators such as model accuracy and recall, as well as analysis of the model’s performance in different environments and conditions. This will help to have a more comprehensive understanding of the YOLOv8 algorithm and provide a foundation for further optimization and improvement of the algorithm.
5 On the main interface, the system provides functions that support picture, video, real-time camera and batch file input. Users can select the picture or video to be detected by facial expression by clicking the corresponding button, or start the camera for real-time detection. When performing facial expression detection, the system will display the detection results in real time and store the detection records in the database.

6 In addition, the system also provides the function of one-click replacement of YOLOv8/v5 models. Users can select a different YOLOv8 model for detection by clicking the “Change Model” button on the interface. At the same time, the dataset attached to the system can also be used to retrain the model to meet the user’s detection needs in different scenarios.
Data preprocess
In the preprocessing stage, we adopted automatic orientation correction to ensure the consistency of all images in spatial direction, thereby reducing unnecessary complexity in the model learning process. In addition, all images were uniformly stretched to a resolution of 640x640 pixels, which standardized the size of the input data and provided a uniform basis for feature extraction. In terms of category labeling, we have made careful adjustments: a total of 4 categories have been remapped and 2 categories have been removed, thereby improving the discrimination between categories and ensuring that the model can focus on learning expressions with sufficient samples. It is worth noting that our dataset did not apply any form of enhancement processing, aimed at evaluating the performance of the model when processing raw images without enhancement. Although this method may reduce the model’s adaptability to novel environments, it also provides us with an opportunity to observe the pure performance of the model under basic conditions.

The following figure shows the distribution of instances for different classes in the dataset, including Anger, Contempt, Disgust, Fear, Happy, Neutral, Sad, and Surprise. From the distribution chart, we can see that these emotion categories are relatively evenly distributed in the dataset, and this balance is achieved through careful design and data filtering. A uniform distribution of categories is crucial for preventing algorithm bias and ensuring that the model does not lean towards overly representative categories. It can be seen that there is not much difference in the number of instances for each expression category, which is beneficial for model training.

ImageDataGeneration() is an image generator that can also enhance data in batches, expand dataset size (such as rotation, deformation, normalization, etc.), and enhance the model’s generalization ability. Combining the previous model and data, the code for the training section is as follows:
Analysis of experimental results:
YOLO versions (v5n, v6, v7, v8n) have very, very close mAP scores, at (0.493, 0.493, 0.492, and 0.497, respectively), indicating the same overall detection capability of the model. YOLOv8n has a mAP of 0.497, which is slightly better than other versions. While the four versions perform similarly on the F1-Score, YOLOv8n has a slight advantage on mAP, indicating that it performs slightly better overall on object detection tasks than other versions

System process
1 After the user launches the application, the system creates an instance of the MainWindow class. This instance is responsible for initializing the interface and related parameters of the entire application, providing the user with a starting point for the operation.
2 The application provides an intuitive interface through which the user can select input sources. Input sources can be images captured by the camera in real time, video files, or still pictures.
3 Once the user has identified the input source, the system calls the relevant media processors and methods to process the input data. This may involve the configuration of the camera, the reading of the video file, or the loading of the image file.
4 When the media input source is ready, the system enters a continuous frame processing loop. The specific process is as follows:
Pre-processing phase: The system performs pre-processing on each frame of image, which may include steps such as image scaling, color space conversion, and normalization to meet the input requirements of the YOLO model.
Detection and recognition stage: The preprocessed image will be sent to the YOLOv8 model for facial expression detection and recognition. The model will output the face position and corresponding expression category.
Interface update stage: As the detection results are generated, the interface will be updated in real time, displaying the detection box, marking the expression category, and displaying the detection statistics in the table or bar chart of the interface.
Interactive operation: Users can perform a variety of actions through buttons provided by the interface, such as saving test results, querying author and version information, and filtering and analyzing specific test results through drop-down menus.
Media Control: Users can also control the playback status of media, such as starting or stopping camera capture, video playback, or image analysis.

If you have any questions or need the full code or data or need customized development requirements, you can contact me Email Mail me : slowlon@foxmail.com