Abstract

Computer vision has progressed rapidly in recent years, but real-time performance on low-power devices remains challenging. We propose a lightweight Feature Pyramid Network (LFPN) with adaptive attention to reduce computation while preserving accuracy. Experiments on COCO show a +2.4% mAP gain over YOLOv8 with a 15% speedup.

Methodology

We adopt an improved CSPDarknet backbone and introduce depthwise separable convolutions to reduce parameters. For data augmentation, we apply Mosaic and Mixup strategies. We also design an IoU-based loss to improve bounding-box regression convergence.

Backbone Design

Improved CSPDarknet with depthwise separable convolutions to reduce parameters.

Data Augmentation

Mosaic + Mixup to improve robustness under occlusion and motion blur.

Loss Function

A redesigned IoU-based loss to speed up box regression convergence.

Architecture Diagram Placeholder (Place image in /public/architecture.png)

Experimental Results

We evaluate on COCO val2017. Experiments run on an NVIDIA T4 GPU with an input resolution of 640×640.
Model Backbone mAP@0.5 (%) FPS (T4) Params (M)
YOLOv5-s CSPDarknet 37.4 142 7.2
YOLOv8-s CSPDarknet 44.9 120 11.1
Ours-LFPN Best MobileNetV3 47.3 138 6.8

Our model improves accuracy by 2.4% while reducing parameters by 40%.

Qualitative Result A (Placeholder)
Confusion Matrix (Placeholder)

Interactive Inference Demo

Click the button below to simulate a real-time detection process (Static Demo).

Discussion

Unida Comments (GitHub Login)

Loading comments...

Leave a Message

Checking login status...