PhysSeg: Fire & Smoke Instance Segmentation

Abstract

Fire and smoke incidents pose serious threats to human life, critical infrastructure, and ecosystems, creating an urgent need for accurate and timely detection systems. Existing methods face major challenges in instance segmentation due to the distinct visual characteristics of the two hazard classes: fire exhibits sharp boundaries driven by combustion dynamics, whereas smoke presents diffuse, gradually varying patterns influenced by atmospheric conditions, scale, and illumination. Current approaches employ uniform feature extraction for both classes, failing to exploit these physical differences and consequently producing inaccurate instance masks in complex real-world scenarios. To address these limitations, we propose PhysSeg, a physics-informed multi-scale pattern segmentation network comprising two complementary components: a Multi-Scale Pattern Encoder (MSPE), which applies class-specific convolution kernels to independently capture fire's sharp boundary structure and smoke's diffuse spatial gradients, and an Adaptive Fusion Module (AFM), which dynamically integrates the dual-pattern features through spatial attention. We further introduce a large-scale dual-class dataset of 7,917 polygon-annotated images spanning indoor, outdoor, daytime, nighttime, and unmanned aerial vehicle (UAV) environments, establishing a comprehensive benchmark for fire and smoke instance segmentation. Experimental results demonstrate that PhysSeg achieves 40.2% mask average precision (AP), outperforming PointRend by +5.2%, Mask2Former by +5.4%, and CondInst by +6.0%, while maintaining 17.2 frames per second (FPS). Zero-shot evaluation on three external benchmarks achieves 34.4% mean AP, exceeding all five competing methods including Mask2Former with a Swin-Tiny backbone by +2.3%, confirming robust generalisation to unseen fire and smoke domains.

Highlights

Multi-Scale Pattern Encoder

Class-specific small-scale kernels (3×3) for fire's sharp boundaries and large-scale kernels (5×5, 7×7) for smoke's diffuse gradients — applied across all FPN levels.

Adaptive Fusion Module

Pixel-wise softmax-normalized spatial attention weights dynamically integrate complementary fire and smoke features for precise instance-level discrimination.

40.2% AP · 17.2 FPS

+5.2% AP over PointRend, +5.4% over Mask2Former, +6.0% over CondInst with real-time throughput on a single GPU.

Physics-Informed Design

Kernel scale selection guided by the physical properties of fire (combustion dynamics, sharp edges) and smoke (atmospheric diffusion, gradual gradients).

7,917-Image Dataset

First large-scale dual-class instance segmentation benchmark for fire and smoke with polygon-level annotations across indoor, outdoor, daytime, nighttime, and UAV scenes.

Zero-Shot Generalisation

34.4% mean AP on three unseen external benchmarks — outperforming all five state-of-the-art competitors without any fine-tuning.

Architecture

Overall architecture of PhysSeg. The framework comprises a DCNv2-enhanced ResNet-50 backbone, Feature Pyramid Network (FPN), Multi-Scale Pattern Encoder (MSPE) with parallel fire and smoke branches, Adaptive Fusion Module (AFM) with spatial attention, and CondInst dynamic mask head.

Dataset Overview

7,917 images · Dual-class instance annotations (fire + smoke) · Polygon masks · Indoor / Outdoor / Industrial / Wildfire / Vehicle / UAV scenes

7,917

Total Images

2

Classes

Instance

Annotation Level

Polygon

Mask Type

Dataset overview across different conditions — Dataset overview — diverse conditions across all scene categories

Dataset Comparison

Dataset	Images	Fire	Smoke	Mask Type	Level	Year
BoWFire	226	✓		Binary	Semantic	2015
FESB MLID	400	✓	✓	Binary	Semantic	2017
SMOKE5K	5,400		✓	Binary	Semantic	2022
FSSD	1,968	✓	✓	Binary	Semantic	2023
FLAME	2,003	✓		Binary	Semantic	2023
IoT-Fire	1,100	✓		Binary	Semantic	2023
Ours (PhysSeg)	7,917	✓	✓	Polygon	Instance	2026

PhysSeg Visual Results — Diverse Conditions & Scenarios

Model predictions across industrial, indoor, wildfire, vehicle, and complex scenes

Indoor house fire — 🏠 Indoor / house fire

Complex scenes — ⚠️ Complex & visually ambiguous scenes — overlapping fire/smoke, background clutter, extreme lighting

Comparison with Competing Methods

PhysSeg predictions vs ground truth — PhysSeg achieves superior boundary precision and complete instance coverage across scales for both fire and smoke.

Zero-shot generalization results — Zero-shot generalisation across five datasets (BowFire, BA-UAV, FESB, MLID, IoT-Fire) without fine-tuning.

Fire instance predictions — PhysSeg predictions vs. ground truth — fire instances across varying scale and illumination.

Video Demos

Input video (left) vs. PhysSeg segmentation output (right)

Input

PhysSeg Output

Input

PhysSeg Output

Input

PhysSeg Output

Input

PhysSeg Output

Input

PhysSeg Output

Input

PhysSeg Output

Citation

If you find this work useful, please cite:

@article{ayaz2026physseg,
  title   = {PhysSeg: Physics-Informed Multi-Scale Pattern Encoding
             for Fire and Smoke Instance Segmentation},
  author  = {Ayaz, Muhammad and Amin, Sareer ul and Yar, Hikmat
             and Khan, Salman and Shin, Wonseop and Seo, Sanghyun},
  journal = {Expert Systems with Applications},
  year    = {2026},
  publisher = {Elsevier}
}

PhysSeg: Physics-Informed Multi-Scale Pattern Encoding for Fire and Smoke Instance Segmentation

Abstract

Highlights

Multi-Scale Pattern Encoder

Adaptive Fusion Module

40.2% AP · 17.2 FPS

Physics-Informed Design

7,917-Image Dataset

Zero-Shot Generalisation

Architecture

Dataset Overview

Dataset Comparison

PhysSeg Visual Results — Diverse Conditions & Scenarios

Comparison with Competing Methods

Video Demos

Citation