Expert Systems with Applications  ·  Elsevier  ·  2026

PhysSeg: Physics-Informed Multi-Scale Pattern Encoding for Fire and Smoke Instance Segmentation

Muhammad Ayaz1, Sareer ul Amin1, Hikmat Yar2, Salman Khan3, Yonghoon Jung1, Sanghyun Seo1,*
1 Chung-Ang University, Republic of Korea  |  2 KAIST, Republic of Korea  |  3 Oxford Brookes University, United Kingdom
* Corresponding Author  ·  sanghyun@cau.ac.kr
PhysSeg is an end-to-end physics-informed instance segmentation framework that unifies a Multi-Scale Pattern Encoder (MSPE) — applying class-specific small-scale kernels (3×3) for fire's sharp boundaries and large-scale kernels (5×5, 7×7) for smoke's diffuse gradients — with an Adaptive Fusion Module (AFM) via pixel-wise spatial attention, achieving 40.2% AP at 17.2 FPS on a new 7,917-image dual-class benchmark.

Abstract

Fire and smoke incidents pose serious threats to human life, critical infrastructure, and ecosystems, creating an urgent need for accurate and timely detection systems. Existing methods face major challenges in instance segmentation due to the distinct visual characteristics of the two hazard classes: fire exhibits sharp boundaries driven by combustion dynamics, whereas smoke presents diffuse, gradually varying patterns influenced by atmospheric conditions, scale, and illumination. Current approaches employ uniform feature extraction for both classes, failing to exploit these physical differences and consequently producing inaccurate instance masks in complex real-world scenarios. To address these limitations, we propose PhysSeg, a physics-informed multi-scale pattern segmentation network comprising two complementary components: a Multi-Scale Pattern Encoder (MSPE), which applies class-specific convolution kernels to independently capture fire's sharp boundary structure and smoke's diffuse spatial gradients, and an Adaptive Fusion Module (AFM), which dynamically integrates the dual-pattern features through spatial attention. We further introduce a large-scale dual-class dataset of 7,917 polygon-annotated images spanning indoor, outdoor, daytime, nighttime, and unmanned aerial vehicle (UAV) environments, establishing a comprehensive benchmark for fire and smoke instance segmentation. Experimental results demonstrate that PhysSeg achieves 40.2% mask average precision (AP), outperforming PointRend by +5.2%, Mask2Former by +5.4%, and CondInst by +6.0%, while maintaining 17.2 frames per second (FPS). Zero-shot evaluation on three external benchmarks achieves 34.4% mean AP, exceeding all five competing methods including Mask2Former with a Swin-Tiny backbone by +2.3%, confirming robust generalisation to unseen fire and smoke domains.

Highlights

Multi-Scale Pattern Encoder

Class-specific small-scale kernels (3×3) for fire's sharp boundaries and large-scale kernels (5×5, 7×7) for smoke's diffuse gradients — applied across all FPN levels.

Adaptive Fusion Module

Pixel-wise softmax-normalized spatial attention weights dynamically integrate complementary fire and smoke features for precise instance-level discrimination.

40.2% AP · 17.2 FPS

+5.2% AP over PointRend, +5.4% over Mask2Former, +6.0% over CondInst with real-time throughput on a single GPU.

Physics-Informed Design

Kernel scale selection guided by the physical properties of fire (combustion dynamics, sharp edges) and smoke (atmospheric diffusion, gradual gradients).

7,917-Image Dataset

First large-scale dual-class instance segmentation benchmark for fire and smoke with polygon-level annotations across indoor, outdoor, daytime, nighttime, and UAV scenes.

Zero-Shot Generalisation

34.4% mean AP on three unseen external benchmarks — outperforming all five state-of-the-art competitors without any fine-tuning.

Architecture

PhysSeg overall architecture diagram

Overall architecture of PhysSeg. The framework comprises a DCNv2-enhanced ResNet-50 backbone, Feature Pyramid Network (FPN), Multi-Scale Pattern Encoder (MSPE) with parallel fire and smoke branches, Adaptive Fusion Module (AFM) with spatial attention, and CondInst dynamic mask head.

Dataset Overview

7,917 images · Dual-class instance annotations (fire + smoke) · Polygon masks · Indoor / Outdoor / Industrial / Wildfire / Vehicle / UAV scenes

7,917
Total Images
2
Classes
Instance
Annotation Level
Polygon
Mask Type
Dataset overview across different conditions
Dataset overview — diverse conditions across all scene categories

Dataset Comparison

DatasetImagesFireSmokeMask TypeLevelYear
BoWFire226BinarySemantic2015
FESB MLID400BinarySemantic2017
SMOKE5K5,400BinarySemantic2022
FSSD1,968BinarySemantic2023
FLAME2,003BinarySemantic2023
IoT-Fire1,100BinarySemantic2023
Ours (PhysSeg)7,917 PolygonInstance2026

PhysSeg Visual Results — Diverse Conditions & Scenarios

Model predictions across industrial, indoor, wildfire, vehicle, and complex scenes

Industrial fire scenes
🏭 Industrial fire scenes
Indoor house fire
🏠 Indoor / house fire
Forest wildfire
🌲 Forest & wildfire
Vehicle fire
🚗 Vehicle fire
Complex scenes
⚠️ Complex & visually ambiguous scenes — overlapping fire/smoke, background clutter, extreme lighting

Comparison with Competing Methods

PhysSeg predictions vs ground truth
PhysSeg achieves superior boundary precision and complete instance coverage across scales for both fire and smoke.
Zero-shot generalization results
Zero-shot generalisation across five datasets (BowFire, BA-UAV, FESB, MLID, IoT-Fire) without fine-tuning.
Fire instance predictions
PhysSeg predictions vs. ground truth — fire instances across varying scale and illumination.

Video Demos

Input video (left)  vs.  PhysSeg segmentation output (right)

Citation

If you find this work useful, please cite:

@article{ayaz2026physseg,
  title   = {PhysSeg: Physics-Informed Multi-Scale Pattern Encoding
             for Fire and Smoke Instance Segmentation},
  author  = {Ayaz, Muhammad and Amin, Sareer ul and Yar, Hikmat
             and Khan, Salman and Shin, Wonseop and Seo, Sanghyun},
  journal = {Expert Systems with Applications},
  year    = {2026},
  publisher = {Elsevier}
}