- 1Department of Geography, Ludwig-Maximilians-University Munich, Munich, Germany (v.zerres@lmu.de)
- 2Institute of Landscape Ecology, Universität Münster, Münster, Germany
Reliable global datasets of key forest variables are urgently needed to monitor forest dynamics both on regional and global scales. Forest canopy height is one of these key variables due to its close correlation to forest biomass and carbon stocks. Recently, new and promising datasets have been developed that utilize deep convolutional neural networks to predict canopy height from optical Sentinel-2 satellite data on a global scale. But how is this possible, given that there is no physical relationship between optical data and canopy height? To understand what the models have learned, we expanded upon the study of Lang et al. (2023) and quantified the contributions of geographical, spectral, and contextual features to the model's outcome. To evaluate the effect of geographical coordinates, the geo-locations of the model input scenes were systematically altered, while maintaining identical spectral features. The resulting canopy height predictions revealed consistent dependencies on geographic location, with mean increases of up to 10 m across entire Sentinel-2 scenes. Effect sizes for latitudinal shifts were large (Cohen's d ≈ 1), indicating that the model interprets spectrally identical input data differently at varying locations. This suggests that the subtle biases arose from the learned spatial priors of the model ensemble. Consequently, the accuracy of predictions decreases in areas where forest height substantially differs from the mean height typical for the respective biome or climate zone, e.g., due to local soil properties, climatic effects, or uncommon forestry management practices. To isolate the effect of spectral properties, we both increased and decreased values of single spectral bands in discrete steps while maintaining the same geographic locations. Mean differences in canopy height predictions, compared to those derived from unmanipulated input data, showed varying responses across different bands, manipulation degrees, and sample locations. The observed changes were not systematically connected to the manipulated spectral data, suggesting that spectral features did not significantly influence the model's output. By modifying the input data, we highlighted potentially significant obstacles to the further development of AI-driven models of key forest variables which need to be taken into account for applications thereof.
How to cite: Zerres, V., Sanchez, E., Nowosad, J., Meyer, H., and Lukas, L.: Global Canopy Height Models from Optical Satellite Data: What Has the AI Learned?, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-11803, https://doi.org/10.5194/egusphere-egu26-11803, 2026.