EGU21-12384, updated on 09 Jan 2023
EGU General Assembly 2021
© Author(s) 2023. This work is distributed under
the Creative Commons Attribution 4.0 License.

AI-Ready Training Datasets for Earth Observation: Enabling FAIR data principles for EO training data.

Alastair McKinstry1,2, Oisin Boydell3,4, Quan Le3,4, Inder Preet3,4, Jennifer Hanafin1,2, Manuel Fernandez1,2, Adam Warde1,2, Venkatesh Kannan1,2, and Patrick Griffiths5
Alastair McKinstry et al.
  • 1NUI Galway, Irish Centre for High-End Computing, Galway, Ireland
  • 2Irish Centre for High End Computing, NUI Galway, Galway, Ireland
  • 3University College Dublin, Belfield, Dublin 4, Ireland
  • 4CeADAR, UCD, Ireland
  • 5ESA ESRIN, Via Galileo Galilei, 1, 00044 Frascati RM, Italy

The ESA-funded AIREO project [1] sets out to produce AI-ready training dataset specifications and best practices to support the training and development of machine learning models on Earth Observation (EO) data. While the quality and quantity of EO data has increased drastically over the past decades, availability of training data for machine learning applications is considered a major bottleneck. The goal is to move towards implementing FAIR data principles for training data in EO, enhancing especially the finability, interoperability and reusability aspects.  To achieve this goal, AIREO sets out to provide a training data specification and to develop best practices for the use of training datasets in EO. An additional goal is to make training data sets self-explanatory (“AI-ready) in order to expose challenging problems to a wider audience that does not have expert geospatial knowledge. 

Key elements that are addressed in the AIREO specification are granular and interoperable metadata (based on STAC), innovative Quality Assurance metrics, data provenance and processing history as well as integrated feature engineering recipes that optimize platform independence. Several initial pilot datasets are being developed following the AIREO data specifications. These pilot applications include for example  forest biomass, sea ice detection and the estimation of atmospheric parameters.An API for the easy exploitation of these datasets will be allow the Training Datasets (TDS) to work against EO catalogs (based on OGC STAC catalogs and best practises from ML community) to allow updating and updated model training over time.


This presentation will present the first version of the AIREO training dataset specification and will showcase some elements of the best-practices that were developed. The AIREO compliant pilot datasets will be presented which are openly accessible and community feedback is explicitly encouraged. 


How to cite: McKinstry, A., Boydell, O., Le, Q., Preet, I., Hanafin, J., Fernandez, M., Warde, A., Kannan, V., and Griffiths, P.: AI-Ready Training Datasets for Earth Observation: Enabling FAIR data principles for EO training data., EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-12384,, 2021.


Display file

Comments on the display

to access the discussion