Effective Variance Attention-enhanced Diffusion Model (EVADM) for Crop Field Aerial Image Super Resolution
This is the repository includes the models, methods and data developed in paper:
Effective variance attention-enhanced diffusion model for crop field aerial image super resolution that published in ISPRS Journal of Photogrammetry and Remote Sensing.
ResearchGate: ResearchGate Article
The Effective Variance Attention-enhanced Diffusion Model (EVADM) is designed to enhance the resolution and quality of aerial imagery, particularly focusing on high-resolution cropland images. By leveraging emerging diffusion models (DM) and introducing the Variance-Average-Spatial Attention (VASA) mechanism, EVADM significantly improves image super-resolution (SR) tasks.
- Development of the CropSR Dataset: Created a high-resolution aerial image dataset, namely CropSR, with over 321,000 samples for self-supervised SR training.
- Introduction of Variance-Average-Spatial Attention (VASA): Designed a novel attention mechanism inspired by the trend of decreasing image variance with increasing flight altitude, enhancing SR model performance.
- Efficient VASA-enhanced Diffusion Model (EVADM): Developed a robust model that leverages VASA to improve the quality of aerial imagery super-resolution.
- Comprehensive Evaluation Metrics: Introduced the Super-Resolution Relative Fidelity Index (SRFI) for a nuanced assessment of structural and perceptual similarities in SR outputs.
- Description: A high-resolution aerial image dataset comprising over 321,000 samples for self-supervised SR training.
- Description: A combined dataset constructed from matched orthomosaic mapping (CropSR-OR) and fixed-point photographs (CropSR-FP).
- Total Pairs: More than 5,000 pairs.
- The test datasets can be accessed at CropSR (for Crop Field Aerial Image Super Resolution), Mendeley Data.
- Achieved a FID reduction of 14.6, and 27% boost of SRFI for ×2 real SR datasets.
- Achieved a FID reduction of 8.0, and 6% boost of SRFI for ×4 real SR datasets.
EVADM has demonstrated superior generalization capabilities on the open Agriculture-Vision dataset, highlighting its robustness across various aerial imagery tasks.
The model's effectiveness is validated through ablation studies and feature-attention map analyses, providing insights into the mechanism of VASA and the SR process.
EVADM offers a promising approach for realistic aerial imagery super-resolution, showcasing high practicality for various downstream applications in agriculture and beyond.
All models were implemented using Python and the PyTorch framework and trained on an NVIDIA RTX 4090 GPU. The EVADM model is based on the LDM (Rombach et al., 2022), please refer to both EVADM and LDM setup instructions. Download weights to EVADM/weights/ folder from weights.
Go under EVADM/ and run for EVADM SR usage demo:
python eva101_EVADM_infer.py
For calculating the SRFI model, see :
eva102_SRFI_metrics.py
This project is licensed under the MIT License. See the LICENSE file for details.
We thank all reviewers for their constructive feedback, which greatly contributed to the improvement of this project.