Atmospheric turbulence distorts visual imagery, posing significant challenges for information interpretation by both humans and machines. Traditional approaches to mitigating atmospheric turbulence are predominantly model-based, such as CLEAR, but are computationally intensive and memory-demanding, making real-time operations impractical. In contrast, deep learning-based methods have garnered increasing attention but are currently effective primarily for static scenes. This project proposes novel learning-based frameworks specifically designed to support dynamic scenes.
Our objectives are twofold: (i) to develop real-time video restoration techniques that mitigate spatio-temporal distortions, enhancing the visual interpretation of scenes for human observers, and (ii) to support decision-making by implementing and evaluating real-time object recognition and tracking using the restored video.
MAMAT is a novel Mamba-based method in which the first module employs deformable 3D convolutions for non-rigid registration to reduce spatial shifts, while the second module enhances contrast and detail. Leveraging the advanced capabilities of the 3D Mamba architecture, experimental results demonstrate that MAMAT outperforms state-of-the-art learning-based methods.
RMFAT, a Recurrent Multi-scale Feature Atmospheric Turbulence Mitigator, restores videos efficiently and consistently by using a lightweight two-input recurrent framework with multi-scale feature encoding and temporal warping to enhance spatial detail and temporal coherence.
JDATT is a knowledge distillation framework designed to reduce model size and improve inference speed. We introduce a joint end-to-end training strategy that preserves image quality through reconstruction loss, Channel-Wise Distillation loss, and Masked Generative Distillation loss, while maintaining detection performance via detection loss and Kullback–Leibler divergence..
The DeTurb framework combines geometric restoration with an enhancement module. Random perturbations and geometric distortions are corrected using a pyramid architecture with deformable 3D convolutions, producing aligned frames. These frames are then reconstructed into a sharp, clear image through a multi-scale 3D Swin Transformer architecture.