SatWellMCQ: A Vision&ndash;Language Satellite Datasetfor MCQ-Based Image Grounding of Oil Wells

Ahmed Emam; Sultan Alrowili; Mathan K. Eswaran; Romeo Kinzler; Younes Samih

doi:https://doi.org/10.5194/egusphere-egu26-3405

[Back] [Session GI2.1]

EGU26-3405, updated on 13 Mar 2026

https://doi.org/10.5194/egusphere-egu26-3405

EGU General Assembly 2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

SatWellMCQ: A Vision–Language Satellite Datasetfor MCQ-Based Image Grounding of Oil Wells

Ahmed Emam¹, Sultan Alrowili¹, Mathan K. Eswaran¹, Romeo Kinzler², and Younes Samih³

Ahmed Emam et al.

¹IBM Research, Riyadh, Saudi Arabia
²IBM Research, Zurich, Switzerland
³IBM Research, Abu Dhabi, United Arab Emirates

Monitoring oil and gas wells is essential for assessing environmental degradation and long-term impacts such as methane emissions from abandoned and orphaned wells. Satellite imagery combined with machine learning offers scalable capabilities for detecting and characterizing oil and gas infrastructure, yet progress remains constrained by the lack of multimodal, multiple-choice (MCQ) vision-language datasets that enable structured evaluation and post-training of vision-language models (VLMs) for oil well scene grounding. Existing resources are predominantly visual-only and therefore provide limited support for image grounding from satellite imagery.

To address this gap, we introduce SatWellMCQ, a vision-language dataset of expert-verified satellite imagery paired with natural-language descriptions and multiple-choice supervision for image-grounded identification and localization of oil wells. SatWellMCQ uses high-resolution multispectral Planet imagery (RGB and infrared) and text annotations that describe well type and spatial context. Each sample includes one expert-verified correct description and three semantically plausible distractor descriptions drawn from other samples, enabling structured MCQ evaluation. All samples were manually verified by a senior domain expert with 100% intra-expert agreement, ensuring accurate alignment between images, labels, and text. The dataset covers four categories relevant to oil well monitoring: active wells, suspended wells, abandoned wells, and control samples without visible wells, yielding a balanced distribution for training and evaluation. We publicly release SatWellMCQ to support research on image grounding and vision-language adaptation in satellite imagery of oil wells.

We evaluate SatWellMCQ across state-of-the-art VLMs in zero-shot and supervised fine-tuning (SFT) settings. In the zero-shot setup, performance is moderate only for large-scale models, with the best result achieved by Qwen3-VL-235B at 0.670 accuracy. Compact models transfer poorly in zero-shot evaluation (e.g., Granite~3.3~2B at 0.422 and Phi-4-multimodal-instruct~6B at 0.376), highlighting the difficulty of domain-specific oil well analysis without targeted supervision. Supervised fine-tuning on SatWellMCQ yields substantial gains for compact models: Granite~3.3~2B improves to 0.722 and Phi-4-multimodal-instruct~6B reaches 0.730, surpassing all zero-shot baselines. These results show that SatWellMCQ poses a challenging benchmark for current VLMs while enabling effective domain adaptation through structured MCQ supervision.

Overall, SatWellMCQ provides a resource for post-training and benchmarking VLMs on image grounding of oil wells in satellite imagery and supports geoscientific monitoring tasks relevant to environmental impact assessment and methane mitigation.

How to cite: Emam, A., Alrowili, S., Eswaran, M. K., Kinzler, R., and Samih, Y.: SatWellMCQ: A Vision–Language Satellite Datasetfor MCQ-Based Image Grounding of Oil Wells, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-3405, https://doi.org/10.5194/egusphere-egu26-3405, 2026.