Fire and smoke incidents pose serious threats to human life, critical infrastructure, and ecosystems, creating an urgent need for accurate and timely detection systems. Existing methods face major challenges in instance segmentation due to the distinct visual characteristics of the two hazard classes: fire exhibits sharp boundaries driven by combustion dynamics, whereas smoke presents diffuse, gradually varying patterns influenced by atmospheric conditions, scale, and illumination. Current approaches employ uniform feature extraction for both classes, failing to exploit these physical differences and consequently producing inaccurate instance masks in complex real-world scenarios. To address these limitations, we propose PhysSeg, a physics-informed multi-scale pattern segmentation network comprising two complementary components: a Multi-Scale Pattern Encoder (MSPE), which applies class-specific convolution kernels to independently capture fire's sharp boundary structure and smoke's diffuse spatial gradients, and an Adaptive Fusion Module (AFM), which dynamically integrates the dual-pattern features through spatial attention. We further introduce a large-scale dual-class dataset of 7,917 polygon-annotated images spanning indoor, outdoor, daytime, nighttime, and unmanned aerial vehicle (UAV) environments, establishing a comprehensive benchmark for fire and smoke instance segmentation. Experimental results demonstrate that PhysSeg achieves 40.2% mask average precision (AP), outperforming PointRend by +5.2%, Mask2Former by +5.4%, and CondInst by +6.0%, while maintaining 17.2 frames per second (FPS). Zero-shot evaluation on three external benchmarks achieves 34.4% mean AP, exceeding all five competing methods including Mask2Former with a Swin-Tiny backbone by +2.3%, confirming robust generalisation to unseen fire and smoke domains.
Class-specific small-scale kernels (3×3) for fire's sharp boundaries and large-scale kernels (5×5, 7×7) for smoke's diffuse gradients — applied across all FPN levels.
Pixel-wise softmax-normalized spatial attention weights dynamically integrate complementary fire and smoke features for precise instance-level discrimination.
+5.2% AP over PointRend, +5.4% over Mask2Former, +6.0% over CondInst with real-time throughput on a single GPU.
Kernel scale selection guided by the physical properties of fire (combustion dynamics, sharp edges) and smoke (atmospheric diffusion, gradual gradients).
First large-scale dual-class instance segmentation benchmark for fire and smoke with polygon-level annotations across indoor, outdoor, daytime, nighttime, and UAV scenes.
34.4% mean AP on three unseen external benchmarks — outperforming all five state-of-the-art competitors without any fine-tuning.
Overall architecture of PhysSeg. The framework comprises a DCNv2-enhanced ResNet-50 backbone, Feature Pyramid Network (FPN), Multi-Scale Pattern Encoder (MSPE) with parallel fire and smoke branches, Adaptive Fusion Module (AFM) with spatial attention, and CondInst dynamic mask head.
7,917 images · Dual-class instance annotations (fire + smoke) · Polygon masks · Indoor / Outdoor / Industrial / Wildfire / Vehicle / UAV scenes
| Dataset | Images | Fire | Smoke | Mask Type | Level | Year |
|---|---|---|---|---|---|---|
| BoWFire | 226 | ✓ | Binary | Semantic | 2015 | |
| FESB MLID | 400 | ✓ | ✓ | Binary | Semantic | 2017 |
| SMOKE5K | 5,400 | ✓ | Binary | Semantic | 2022 | |
| FSSD | 1,968 | ✓ | ✓ | Binary | Semantic | 2023 |
| FLAME | 2,003 | ✓ | Binary | Semantic | 2023 | |
| IoT-Fire | 1,100 | ✓ | Binary | Semantic | 2023 | |
| Ours (PhysSeg) | 7,917 | ✓ | ✓ | Polygon | Instance | 2026 |
Model predictions across industrial, indoor, wildfire, vehicle, and complex scenes
Input video (left) vs. PhysSeg segmentation output (right)
Input
PhysSeg Output
Input
PhysSeg Output
Input
PhysSeg Output
Input
PhysSeg Output
Input
PhysSeg Output
Input
PhysSeg Output
If you find this work useful, please cite:
@article{ayaz2026physseg,
title = {PhysSeg: Physics-Informed Multi-Scale Pattern Encoding
for Fire and Smoke Instance Segmentation},
author = {Ayaz, Muhammad and Amin, Sareer ul and Yar, Hikmat
and Khan, Salman and Shin, Wonseop and Seo, Sanghyun},
journal = {Expert Systems with Applications},
year = {2026},
publisher = {Elsevier}
}