Hard to measure, hard to model: Using information theory to understand turbulent heat fluxes
- University of Washington, Seattle, United States of America (andrbenn@uw.edu)
Measurements and models constitute the core modes of understanding environmental processes, where a major paradigm of doing science involves confronting hypotheses (represented by models) with data from measurements. Of course, both models and measurements involve uncertainties which can make reasoning about the validity of our hypotheses difficult. This difficulty is exemplified in the study of turbulent heat fluxes where measurements made by eddy-covariance towers often have energy balance gaps and simple regression models often outperform the most sophisticated physically-based models. Our study addresses these issues by identifying the conditions in which either or both models and measurements break down as well as identify potential reasons for these breakdowns.
We use the Structure for Unifying Multiple Modeling Alternatives (SUMMA) to develop an ensemble of models representing multiple hypotheses about how turbulent heat fluxes are generated and compare them against measurements from FluxNet towers at a number of hydro-climatically diverse sites. We evaluate the models against the measurements using both traditional error measures as well as with a general framework based on information theory and conditional probabilities. Extending this base analysis, we compute conditional mutual information of the modeled and observed relationships between turbulent heat fluxes and other meteorological variables (such as shortwave radiation, air temperature, and humidity). This allows us to go further than traditional error measures to explore how well the modeled relationships match the observed, providing a proxy for process correctness. We perform this analysis for a variety of conditions. We first analyze how much information the meteorological variables provide to the observed heat fluxes to estimate the robustness of the measurements. We then compare this with the amount of information that the meteorological variables provide to the simulations to determine whether there are significant deviations between the shared information from the simulations to the observations. This analysis is used to provide recommendations for post processing observations as well as identifying possible process deficiencies in our models.
How to cite: Bennett, A. and Nijssen, B.: Hard to measure, hard to model: Using information theory to understand turbulent heat fluxes, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-5957, https://doi.org/10.5194/egusphere-egu2020-5957, 2020
Comments on the display
AC: Author Comment | CC: Community Comment | Report abuse
Hi,
This was a very nice presentaion.
Could you explain a little more how you computed NMI?
For the comparison to KGE (the scatter plot), X is the model simulation and Y the obs, but in the box plots X and Y are two different processes and you compute one number for the obs and one for the simulations. Is this correct? How do these results then relate back to KGE?
Thank you
Thanks Alison!
> Could you explain a little more how you computed NMI?
Sure. To compute both the entropy and the mutual information we used a nearest neighbor estimator. Here's a link to the paper describing the estimator: https://journals.aps.org/pre/abstract/10.1103/PhysRevE.69.066138
> For the comparison to KGE (the scatter plot), X is the model simulation and Y the obs, but in the box plots X and Y are two different processes and you compute one number for the obs and one for the simulations. Is this correct?
Yes, this is correct. For the KGE - NMI plot we are computing both the NMI and KGE with both simulated and observed latent heats. Then, for the boxplots we are comparing different processes (either shortwave radiation or antecedent precipitation). We calculate the NMI between this process and latent heat for both simulated and observed as you mention. This is to try to understand whether the model and observations have similar process interactions.
> How do these results then relate back to KGE?
These results don't quite relate back to the KGE in a 1-to-1 fashion since they have been filtered differently. In some other work not shown I have looked at computing KGE for these subsets (dry and wet periods, for instance) and have found that it is possible to have "good" performance in KGE but the process interactions differ between the simulated and observed quantities. I plan on digging into these types of situations more in the future and hope this can inform the way that we construct these SUMMA ensembles to be as physically accurate as possible.