Real-time vehicle pedestrian multi-object detection and tracking system (PyQt or PySide6 UI , Python code)

The system uses PySide6 as a GUI library to provide an intuitive and user-friendly interface. Below, I will introduce the functions and design of each main interface in detail.

(1) The system provides SQLite-based registration and login management functions. Users need to register through the registration interface when they use it for the first time. After entering the username and password, the system will store this information in the SQLite database. After successful registration, users can enter the username and password through the login interface to log in. This design can ensure the security of the system and also provide the possibility to add more personalized functions in the future.

(2) On the main interface, the system provides functions to support picture, video, real-time camera and batch file input. Users can select the picture or video to be counted and tracked by multi-object detection by clicking the corresponding button, or start the camera for real-time detection. When performing multi-object detection counting and tracking, the system will display the detection results in real time, realize object tracking, and store the detection records in the database.

(3) In addition, the system also provides the function of changing the YOLOv8 model with one click. Users can select a different YOLOv8 model for detection by clicking the “Change Model” button on the interface. At the same time, the dataset attached to the system can also be used to retrain the model to meet the user’s detection needs in different scenarios.

The solution to implement multi-object tracking usually includes two core steps: object detection and data association. (1) In the object detection stage, YOLOv8 and YOLOv5 are used as efficient deep learning models to identify the location and category of each target from the video frame. This step is the foundation of the tracking process, ensuring that subsequent steps can be carried out on the basis of accurate detection; (2) For data association, that is, how to maintain the identity of the target in successive frames, the ByteTrack algorithm selected in this paper realizes efficient tracking by associating each detection frame. More specifically, the ByteTrack algorithm optimizes the association strategy in the traditional tracking algorithm, and can accurately re-identify and continue to track even when the target is occluded or temporarily disappears, effectively reducing the problem of identity switching.

Our dataset contains 5542 high-quality images covering a variety of scenarios, providing a rich training and testing background for the model. Specifically, the dataset includes 2856 training images, 1343 validation images, and 1343 test images, allowing the model to be trained on diverse data while effectively evaluating the model’s generalization ability through the validation set and test set. This distribution ensures the comprehensiveness of the dataset in all aspects, providing a good foundation for the comprehensive training of the model.

During the pre-processing process, all images are automatically orientation corrected to eliminate the problem of orientation inconsistencies caused by different shooting angles of the device. In addition, with EXIF information removal, we streamline the image data and improve the efficiency of subsequent processing. Then, to meet the model input requirements, all images are adjusted to a size of 416x416 pixels. Although this stretching may cause scale distortion in some images, it provides uniformity for the input of the model, helping to simplify the model architecture and increase the speed of operations.

In the frame_process function, each frame of video is processed by the YOLOv8 model. First, we resize the frame to fit the window and pre-process it. Then, the model makes predictions on this frame of image, outputting the predictions and time. We parse the predictions and plot the recognized objects and their trajectories on the image. Here, we maintain a track history in order to plot the path of each object over time.

In the main program, we load the pretrained model and set the color mapping so that different categories are displayed in different colors. Then, we create a MediaHandler instance to process the video file, connect each frame of image to our frame_process function, and start playing the video.

Our design follows the principles of modularity and responsiveness to ensure the efficiency and scalability of the system.

First, the core of the system is the instantiation of the MainWindow class, which acts as the main controller of the application and is responsible for launching the interface, initializing parameters, and coordinating various subsystems. Its appearance marks the beginning of the user interaction journey, providing a basis point of operation and entry to the graphical interface. This interface allows users to easily select a video stream, live camera, or picture file as an input source to start the object detection process.

When the user selects the input source, MainWindow dynamically calls the corresponding media processor to configure and read the data. For video files or images, the processor is responsible for loading data from storage media; for real-time camera data, it involves real-time data capture and transmission. After the user selects the input source, the system enters a continuous processing cycle, which is responsible for real-time acquisition and processing of image data.

In this cycle, the first step is the pre-processing stage, which involves resizing the image and transforming the color space to fit the input requirements of the YOLO model. Then, the pre-processed image data is sent to the deep learning model for processing. Here we use the YOLOv8 model, which has the ability to quickly and accurately detect multiple objects in the image. The model outputs the location and category of the target, providing the necessary information for the next interface update and user interaction.

As the model generates test results, the interface is updated in real time to reflect these results, including displaying test boxes and category labels. In addition, the interface also provides data statistics and analysis functions, such as displaying the count and category distribution of test targets. Users can interact through the interface, such as saving results, getting help information, or selecting targets to view specific categories through filters.

In addition, in order to improve the user experience, we have added media control functions to the system. Users can start or pause the analysis of the video stream at any time, control the playback of the video, or stop the capture of the camera. This provides users with a high degree of control freedom and makes the whole system more flexible and responsive to user requests.

If you have any questions or need the full code or data or need customized development requirements, you can contact me Email Mail me : slowlon@foxmail.com

Share