关键词:
feature pyramid
lightweight
pixel intersection over union
real-time DEtection TRansformer
small object detection
摘要:
To address the challenges of small target detection in aerial photography images by unmanned aerial vehicle, including complex background, tiny and dense targets, and difficulties in deploying models on mobile devices, this paper proposes an improved lightweight small target detection algorithm based on real-time DEtection TRansformer (RT-DETR) model, named STD-DETR. First, RepConv is introduced to improve the lightweight Starnet network, replacing the original backbone network, thereby achieving lightweight. A novel feature pyramid is then designed, incorporating a 160 pixel × 160 pixel feature map output at the P2 layer to enrich small target information. This approach replaces the traditional method of adding a P2 small target detection head, and introduces the CSP-ommiKernel-squeeze-excitation (COSE) module and space-to-depth (SPD) convolution to enhance the extraction of global features and the fusion of multiscale features. Finally, pixel intersection over union (PIoU) is used to replace the original model's loss function, calculating IoU at the pixel level to more precisely capture small overlapping regions, reducing the miss rate and improving detection accuracy. Experimental results demonstrate that, compared with baseline model, the STD-DETR model achieves improvements of 1.3 percentage points, 2.2 percentage points, and 2.3 percentage points in accuracy, recall, and mAP50 on the VisDrone2019 dataset, while reducing computational cost and parameters by ~34.0% and ~37.9%, respectively. Generalization tests on the Tinyperson dataset show increases of 3.7 percentage points in accuracy and 3.1 percentage points in mAP50, confirming the model’s effectiveness and generalization capability. © 2025 Universitat zu Koln. All rights reserved.