Mitigating the Impact of Thermal Reflections in Object Detection Using Vision-Language Models

Authors

  • Radosław Feiglewicz AGH University of Krakow
  • Andrzej Kos AGH University of Krakow

Abstract

Thermal imaging is increasingly employed for navigation in challenging conditions such as dense smoke or fog. However, the limited availability of thermal images compared to RGB data makes training deep learning models, such as Convolutional Neural Networks (CNNs), significantly more difficult and often yields unsatisfactory results. Vision-Language Models (VLMs), due to their ability to perform tasks without extensive retraining or with only a small number of training samples, hold the potential to overcome current limitations in thermal imaging applications. This paper introduces a method leveraging VLMs to reduce the impact of reflections in thermal images on object detection accuracy, with a particular focus on human detection. The proposed approach improves the F1-score from 0.83 to 0.97 on a dedicated evaluation dataset, outperforming a baseline solution based solely on the widely used YOLOv11 model. Furthermore, we investigate the effects of quantization on various open-source VLMs, analyzing their performance, processing speed, and memory requirements.

Additional Files

Published

2026-02-17

Issue

Section

Image Processing