- 1School of Earth and Environment, University of Leeds, Leeds, UK
- 2National Centre for Atmospheric Science, Leeds, UK
- 3Cooperative Institute for Research in Environmental Sciences, University of Colorado, Boulder, Colorado, USA
- 4NOAA Physical Sciences Laboratory, Boulder, Colorado, USA
There are a myriad of methods for matching data between satellites and surface-based observations (co-location), and no singular way to objectively compare the quality of the matching between methods. This work proposes a framework that allows for an optimised choice of co-location to be evaluated, and shows that this framework selects co-location schemes that demonstrably produce better output data than other typically used choices of co-location scheme.
Matching data described on different spatial and temporal coordinates and retrieved from different sources – spatiotemporal co-location – is an important step in any analysis utilising multiple sources of Earth observation data. For example, validating satellite data against surface-based remote sensing data often requires that the satellite data be spatially aggregated over its field of view near the surface-based observatory, and the surface-based data is temporally aggregated around the time of the satellite overpass. A good data co-location permits sufficient data such that subsequent analyses are viable, whilst limiting the mismatch error induced by comparing data between sources with larger spatiotemporal separations. The schemes by which data are co-located are often parameterised by a few variables that can be arbitrarily selected (for example, the maximum distance between a surface-based observatory and the footprint of a satellite obervation). The choice of these co-location parameters directly impacts all subsequent analyses, and there is no single correct method for selecting a parametrisation.
We describe a data-driven approach for selecting an optimised co-location parametrisation that is domain- and data-agnostic. The mutual information between data sources describes the amount of variability within the data coming from one source that can be described by variability in data from another source. The presented approach selects the co-location parameters such that the data co-location maximises the mutual information between the data sources. The output is paired data between the sources that is as close as possible to being described by a one-to-one relationship, given the input data and co-location scheme.
We apply this method of co-locating data to a validation of the cloud layer height retrievals in the ICESat-2 ATL09 data product against surface-based Cloudnet retrievals. Our method finds location-specific distances within which ICESat-2 data should be compared against data from a given Cloudnet observatory, and that a one-size-fits-all approach to selecting the co-location parameterisation degrades the quality of the resulting matched data through different failure modes, depending on the location. The comparison between vertical cloud fraction profiles between ATL09 and Cloudnet data are demonstrably better when using optimised co-location parameters as opposed to other choices.
As well as improving the quality of data provided to satellite validation studies, this method can be used across many contexts. It may be possible to improve multi-sensor synthesis of data through weighting of the contributions of different data products to the synthesis as a function of the mutual information between the input data products. The method can also be used to programmatically generate labelled training pairs of related data for deep learning models that best encode the relationship between the data sources.
How to cite: Martin, A., Guy, H., Gallagher, M., and Neely III, R.: Non-parametric optimised spatiotemporal data co-location using mutual information, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-18704, https://doi.org/10.5194/egusphere-egu26-18704, 2026.