EGU26-3405, updated on 13 Mar 2026
https://doi.org/10.5194/egusphere-egu26-3405
EGU General Assembly 2026
© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.
Oral | Friday, 08 May, 09:35–09:45 (CEST)
 
Room -2.62
SatWellMCQ: A Vision–Language Satellite Datasetfor MCQ-Based Image Grounding of Oil Wells
Ahmed Emam1, Sultan Alrowili1, Mathan K. Eswaran1, Romeo Kinzler2, and Younes Samih3
Ahmed Emam et al.
  • 1IBM Research, Riyadh, Saudi Arabia
  • 2IBM Research, Zurich, Switzerland
  • 3IBM Research, Abu Dhabi, United Arab Emirates

Monitoring oil and gas wells is essential for assessing environmental degradation and long-term impacts such as methane emissions from abandoned and orphaned wells. Satellite imagery combined with machine learning offers scalable capabilities for detecting and characterizing oil and gas infrastructure, yet progress remains constrained by the lack of multimodal, multiple-choice (MCQ) vision-language datasets that enable structured evaluation and post-training of vision-language models (VLMs) for oil well scene grounding. Existing resources are predominantly visual-only and therefore provide limited support for image grounding from satellite imagery.

To address this gap, we introduce SatWellMCQ, a vision-language dataset of expert-verified satellite imagery paired with natural-language descriptions and multiple-choice supervision for image-grounded identification and localization of oil wells. SatWellMCQ uses high-resolution multispectral Planet imagery (RGB and infrared) and text annotations that describe well type and spatial context. Each sample includes one expert-verified correct description and three semantically plausible distractor descriptions drawn from other samples, enabling structured MCQ evaluation. All samples were manually verified by a senior domain expert with 100% intra-expert agreement, ensuring accurate alignment between images, labels, and text. The dataset covers four categories relevant to oil well monitoring: active wells, suspended wells, abandoned wells, and control samples without visible wells, yielding a balanced distribution for training and evaluation. We publicly release SatWellMCQ to support research on image grounding and vision-language adaptation in satellite imagery of oil wells.

We evaluate SatWellMCQ across state-of-the-art VLMs in zero-shot and supervised fine-tuning (SFT) settings. In the zero-shot setup, performance is moderate only for large-scale models, with the best result achieved by Qwen3-VL-235B at 0.670 accuracy. Compact models transfer poorly in zero-shot evaluation (e.g., Granite~3.3~2B at 0.422 and Phi-4-multimodal-instruct~6B at 0.376), highlighting the difficulty of domain-specific oil well analysis without targeted supervision. Supervised fine-tuning on SatWellMCQ yields substantial gains for compact models: Granite~3.3~2B improves to 0.722 and Phi-4-multimodal-instruct~6B reaches 0.730, surpassing all zero-shot baselines. These results show that SatWellMCQ poses a challenging benchmark for current VLMs while enabling effective domain adaptation through structured MCQ supervision.

Overall, SatWellMCQ provides a resource for post-training and benchmarking VLMs on image grounding of oil wells in satellite imagery and supports  geoscientific monitoring tasks relevant to environmental impact assessment and methane mitigation.

How to cite: Emam, A., Alrowili, S., Eswaran, M. K., Kinzler, R., and Samih, Y.: SatWellMCQ: A Vision–Language Satellite Datasetfor MCQ-Based Image Grounding of Oil Wells, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-3405, https://doi.org/10.5194/egusphere-egu26-3405, 2026.