Inverse Problems are encountered in many fields of geosciences. One class of inverse problems, in the context of predictability, is assimilation of observations in dynamical models of the system under study. Furthermore, objective quantification of the uncertainty on the results obtained is the object of growing concern and interest.

This session will be devoted to the presentation and discussion of methods for inverse problems, data assimilation and associated uncertainty quantification, in ocean and atmosphere dynamics, atmospheric chemistry, hydrology, climate science, solid earth geophysics and, more generally, in all fields of geosciences.

We encourage presentations on advanced methods, and related mathematical developments, suitable for situations in which local linear and Gaussian hypotheses are not valid and/or for situations in which significant model
errors are present. We also welcome contributions dealing with algorithmic aspects and numerical implementation of the solution of inverse problems and quantification of the associated uncertainty, as well as novel methodologies at the crossroad between data assimilation and purely data-driven, machine-learning-type algorithms.

Invited speakers:
Luca Cantarello (University of Leeds)
Jean-Michel Brankart (University of Grenoble)

Public information:
In the session we will encourage all participants to present their work. These brief presentations will last about 5 minutes.

Convener: Javier Amezcua | Co-conveners: Natale Alberto Carrassi, Tijana Janjic, Olivier Talagrand
| Attendance Tue, 05 May, 08:30–10:15 (CEST)

Files for download

Download all presentations (34MB)

Chat time: Tuesday, 5 May 2020, 08:30–10:15

Chairperson: Javier Amezcua
D2841 |
| Highlight
Luca Cantarello, Onno Bokhove, Gordon Inverarity, Stefano Migliorini, and Steve Tobias

Operational data assimilation (DA) schemes rely significantly on satellite observations with much research aimed at their optimisation, leading to a great deal of progress. Here, we investigate the impact of the spatial-temporal variability of satellite observations for DA: is there a case for concentrating effort into the assimilation of small-scale convective features over the large-scale dynamics, or vice versa?


We conduct our study in an isentropic one-and-a-half layer model that mimics convection and precipitation, a revised and more realistic version of the idealised model based on the shallow water equations in [1,2]. Forecast-assimilation experiments are performed by means of a twin-setting configuration, in which pseudo-observations  from a high-resolution nature run are combined with lower-resolution forecasts. The DA algorithm used is the deterministic Ensemble Kalman Filter (see [3]). We focus our research on polar-orbit satellites regarding emitted microwave radiation.


We have developed a new observation operator and a representative observing system in which both ground and satellite observations can be assimilated. The convection thresholds in the model are used as a proxy for cloud formation, clouds, and precipitation. To imitate the use of weighting functions in real satellite applications, radiance values are computed as a weighted sum with contributions from both layers. In the presence of clouds and/or precipitation, we model the response of passive microwave radiation to either precipitating or non-precipitating clouds. The horizontal resolution of satellite observations can be varied to investigate the impact of scale-dependency on the analysis.


New, preliminary results from experiments including both transverse jets and rotation in a periodic domain will be reported and discussed.



[1] Kent, T., Bokhove, O., & Tobias, S. (2017). Dynamics of an idealized fluid model for investigating convective-scale data assimilation. Tellus A: Dynamic Meteorology and Oceanography, 69(1), 1369332.

[2] Kent, T. (2016). An idealised fluid model for convective-scale NWP: dynamics and data assimilation (Doctoral dissertation, PhD Thesis, University of Leeds).

[3] Sakov, P., & Oke, P. R. (2008). A deterministic formulation of the ensemble Kalman filter: an alternative to ensemble square root filters. Tellus A: Dynamic Meteorology and Oceanography, 60(2), 361-371.


How to cite: Cantarello, L., Bokhove, O., Inverarity, G., Migliorini, S., and Tobias, S.: Idealised satellite data assimilation experiments with clouds and precipitation, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-332, https://doi.org/10.5194/egusphere-egu2020-332, 2020.

D2842 |
| Highlight
Jean-Michel Brankart

Many practical applications involve the resolution of large-size inverse problems, without providing more than a moderate-size sample to describe the prior probability distribution. In this situation, additional information must be supplied to augment the effective dimension of the available sample, for instance using a covariance localization approach. In this study, it is suggested that covariance localization can be efficiently applied to an approximate variant of the Metropolis/Hastings algorithm, by modulating the ensemble members by the large-scale patterns of other members. Modulation is used to design a (global) proposal probability distribution (i) that can be sampled at a very low cost, (ii) that automatically accounts for a localized prior covariance, and (iii) that leads to an efficient sampler for the augmented prior probability distribution or for the posterior probability distribution. The resulting algorithm is applied to an academic example, illustrating (i) the effectiveness of covariance localization, (ii) the ability of the method to deal with nonlocal/nonlinear observation operators and non-Gaussian observation errors, (iii) the reliability, resolution and optimality of the updated ensemble, using probabilistic scores appropriate to a non-Gaussian posterior distribution, and (iv) the scalability of the algorithm as a function of the size of the problem. The codes are openly available from github.com/brankart/ensdam.

How to cite: Brankart, J.-M.: Implicitly localized MCMC sampler to cope with nonlocal/nonlinear data constraints in large-size inverse problems, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-2182, https://doi.org/10.5194/egusphere-egu2020-2182, 2020.

D2843 |
| Highlight
Milija Zupanski

High-dimensional ensemble data assimilation applications require error covariance localization in order to address the problem of insufficient degrees of freedom, typically accomplished using the observation-space covariance localization. However, this creates a challenge for vertically integrated observations, such as satellite radiances, aerosol optical depth, etc., since the exact observation location in vertical does not exist. For nonlinear problems, there is an implied inconsistency in iterative minimization due to using observation-space localization which effectively prevents finding the optimal global minimizing solution. Using state-space localization, however, in principal resolves both issues associated with observation space localization.


In this work we present a new nonlinear ensemble data assimilation method that employs covariance localization in state space and finds an optimal analysis solution. The new method resembles “modified ensembles” in the sense that ensemble size is increased in the analysis, but it differs in methodology used to create ensemble modifications, calculate the analysis error covariance, and define the initial ensemble perturbations for data assimilation cycling. From a practical point of view, the new method is considerably more efficient and potentially applicable to realistic high-dimensional data assimilation problems. A distinct characteristic of the new algorithm is that the localized error covariance and minimization are global, i.e. explicitly defined over all state points. The presentation will focus on examining feasible options for estimating the analysis error covariance and for defining the initial ensemble perturbations.

How to cite: Zupanski, M.: Development of a nonlinear ensemble data assimilation method with global state-space covariance localization, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-19831, https://doi.org/10.5194/egusphere-egu2020-19831, 2020.

D2844 |
| Highlight
Nachiketa Chakraborty, Peter Jan van Leeuwen, Michael de Caria, and Manuel Pulido

Time varying processes in nature are often complex with non-linear and non-gaussian components. Complexity of environments and processes make it hard to disentangle different causal mechanisms which drives the observed time-series. It also makes it harder to make forecasts. The standard ways of studying causal relation in the geosciences which includes information theoretic measures of causation as well as predictive framework have deficiencies when applied to non-linear dynamical systems. Here we focus on investigating building a predictive causal framework that allows us to make predictions in simpler systems in a consistent way. We use a Bayesian framework to embed causal measures akin to mutual information from information theory to quantify relations between different random processes in this system. We examine causal relations in toy models and simple systems with a view to eventually applying to the interocean exchange problem in the Indian, the South Atlantic and the Southern Ocean. 

How to cite: Chakraborty, N., van Leeuwen, P. J., de Caria, M., and Pulido, M.: A framework for causality under data assimilation , EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-22521, https://doi.org/10.5194/egusphere-egu2020-22521, 2020.

D2845 |
Maxime Conjard and Henning Omre

The challenge in data assimilation for models representing spatio-temporal phenomena is made harder when the spatial histogram of the variable of interest appears with multiple modes. Pollution source identification constitutes one example where the pollution release represents an extreme event in a fairly homogeneous background. Consequently, our prior belief is that the spatial histogram is bimodal. The traditional Kalman model is based on a Gaussian initial distribution and Gauss-linear dynamic and observation models. This model is contained in the class of Gaussian distribution and is therefore analytically tractable. These properties that make its strenght also render it unsuitable for representing multimodality. To address the issue, we define the selection Kalman model. It is based on a selection-Gaussian initial distribution and Gauss-linear dynamic and observation models. The selection-Gaussian distribution may represent multimodality, skewness and peakedness. It can be seen as a generalization of the Gaussian distribution. The proposed selection Kalman model is contained in the class of selection-Gaussian distributions and therefore analytically tractable. The recursive algorithm used for assessing the selection Kalman model is specified. We present a synthetic case study of spatio-temporal inversion of an initial state containing an extreme event. The study is inspired by pollution monitoring. The results suggest that the use of the selection Kalman model offers significant improvements compared to the traditional Kalman model when reconstructing discontinuous initial states. 

How to cite: Conjard, M. and Omre, H.: Spatio-temporal Inversion using the Selection Kalman Model, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-8979, https://doi.org/10.5194/egusphere-egu2020-8979, 2020.

D2846 |
Antoine Bernigaud, Serge Gratton, Flavia Lenti, Ehouarn Simon, and Oumaima Sohab

 We introduce a new formulation of the 4DVAR objective function by using as a penalty term a p-norm with 1 < p < 2. So far, only the 2-norm, the 1-norm or a mixed of both have been considered as regularization term. This approach is motivated by the nature of the problems encountered in data assimilation, for which such a norm may be more suited to tackle the distribution of the variables. It also aims at making a compromise between the 2-norm that tends to oversmooth the solution or produce Gibbs oscillations, and the 1-norm that tends to "oversparcify" it, in addition to making the problem non-smooth.

The performance of the proposed technique are assessed for different p-values by twin experiments on a linear advection equation. The experiments are then conducted using two different true states in order to assess the performances of the p-norm regularized 4DVAR algorithm in sparse (rectangular function) and "almost" sparse cases (rectangular function with a smoother slope). In this setup, the background and the measurements noise covariance are known.

In order to minimize the 4DVAR objective function with a p-norm as a regularization term we use a gradient descent algorithm that requires the use of duality operators to work on a non-euclidean space. Indeed, Rn together with the p-norm (1 < p < 2) is a Banach space. Finally, to tune the regularization parameter appearing in the formulation of the objective function, we use the Morozov's discrepancy principle.

How to cite: Bernigaud, A., Gratton, S., Lenti, F., Simon, E., and Sohab, O.: p-norm regularization in variational data assimilation, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-5772, https://doi.org/10.5194/egusphere-egu2020-5772, 2020.

D2847 |
| Highlight
Arthur Filoche, Julien Brajard, Anastase Charantonis, and Dominique Béréziat

The analogy between data assimilation and machine learning has already been shown and is still being investigated to address the problem of improving physics-based models. Even though both techniques learn from data, machine learning focuses on inferring model parameters while data assimilation concentrates on hidden system state estimation with the help of a dynamical model. 
Also, neural networks and more precisely ResNet-like architectures can be seen as dynamical systems and numerical schemes, respectively. They are now considered state of the art in a vast amount of tasks involving spatio-temporal forecasting. But to train such networks, one needs dense and representative data which is rarely the case in earth sciences. At the same time, data assimilation offers a proper Bayesian framework allowing to learn from partial, noisy and indirect observations. Thus, each of this field can profit from the other by providing either a learnable class of dynamical models or dense data sets.

In this work, we benefit from powerful and flexible tools provided by the deep learning community based on automatic differentiation that are clearly suitable for variational data assimilation, avoiding explicit adjoint modelling. We use a hybrid model divided into 2 terms. The first term is a numerical scheme that comes from the discretisation of physics-based equations, the second is a convolutional neural network that represents the unresolved part of the dynamics. From the Data Assimilation point of view, our network can be seen as a particular parametrisation of the model error. We then jointly learn this parameterisation and estimate hidden system states within a variational data assimilation scheme. Indirectly, the issue of incorporating physical knowledge into machine learning models is also addressed. 

We show that the hybrid model improves forecast skill compared to traditional data assimilation techniques. The generalisation of the method on different models and data will also be discussed.

How to cite: Filoche, A., Brajard, J., Charantonis, A., and Béréziat, D.: Learning missing part of physics-based models within a variational data assimilation scheme, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-285, https://doi.org/10.5194/egusphere-egu2020-285, 2020.

D2848 |
Haonan Ren, Peter Jan Van Leeuwen, and Javier Amezcua

Data assimilation has been often performed under the perfect model assumption known as the strong-constraint setting. There is an increasing number of researches accounting for the model errors, the weak-constrain setting, but often with different degrees of approximation or simplification without knowing their impact on the data assimilation results. We investigate what effect inaccurate model errors, in particular, the an inaccurate time correlation, can have on data assimilation results, with a Kalman Smoother and the Ensemble Kalman Smoother.
We choose a linear auto-regressive model for the experiment. We assume the true state of the system has the correct and fixed correlation time-scale ωr in the model errors, and the prior or the background generated by the model contains the model error with the fixed, guessed time-scale ωg which differs from the correct one and is also used in the data assimilation process. There are 10 variables in the system and we separate the simulation period into multiple time-windows. And we use a fairly large ensemble size (up to 200 ensemble members) to improve the accuracy of the data assimilation results. In order to evaluate the performance of the EnKS with auto-correlated model errors, we calculate the ratio of root-mean-square error over the spread of all ensemble members.
The results with a single observation at the end of the simulation time-window show that, using an underestimated correlation time-scale leads to overestimated spread of the ensemble, and with an overestimated time-scale, the results show underestimation in the ensemble spread. However, with very dense observation frequency, observing every time-step for instance, the results are completely opposite to the results with a single observation. In order to understand the results, we derive the expression for the true posterior state covariance and the posterior covariance using the incorrect decorrelation time-scale. We do this for a Kalman Smoother to avoid the sampling uncertainties. The results are richer than expected and highly dependent on the observation frequency. From the analytical solution of the analysis, we find that the RMSE is a function of both ωr and ωg, and the spread or the variance only depends on ωg. We also find that the analyzed variance is not always a monotonically increasing function of ωg, and it also depends on the observation frequency. In general, the results show the effect of the correlated model error and the incorrect correlation time-scale on data assimilation result, which is also affected by the observation frequency.

How to cite: Ren, H., Van Leeuwen, P. J., and Amezcua, J.: Effect of inaccurate specification of time-correlated model error in an Ensemble Smoother, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-559, https://doi.org/10.5194/egusphere-egu2020-559, 2020.

D2849 |
Jalisha Theanutti Kallingal, Marko Scholze, Janne Rinne, and Johan Lindstrom

Wetlands in the boreal zone are a significant source of atmospheric methane, and hence they have been intensively studied with mechanistic models for the assessment of methane dynamics. The arctic-enabled dynamic global vegetation model LPJ-GUESS is one of the models that allow quantification and understanding of the natural methane fluxes at various scales ranging from local to regional and global, but with several uncertainties. Complexity in the underlying environmental processes, warming driven alternative paths of meteorological phenomena and changes in hydrological and vegetation conditions are exigent for a calibrated and optimised LPJ-GUESS. In this study, we used the Markov chain Monte Carlo (using Metropolis-Hastings formula) algorithm to quantify the uncertainties of LPJ-GUESS. Application of this method allows greater search of the posterior distribution, leading to a more complete characterisation of the posterior distribution with reduced risk of sample impoverishment. We will present first results from an assimilation experiment optimising LPJ-GUESS model process parameters using the flux measurement data from 2005 to 2015 from the Siikaneva wetlands in southern Finland. We  analyse the parameter efficiency of LPJ-GUESS by looking into the posterior parameter distributions, parameter correlations, and the interconnections of the processes they control. As a part of this work, knowledge about how the methane data can constrain the parameters and processes is derived.

How to cite: Theanutti Kallingal, J., Scholze, M., Rinne, J., and Lindstrom, J.: Data assimilation framework around the LPJ-GUESS model for the optimised simulation of CH4 emission from Northern wetlands, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-10752, https://doi.org/10.5194/egusphere-egu2020-10752, 2020.

D2850 |
Martin Verlaan, Xiaohui Wang, and Hai Xiang Lin

Previous development of a parameter estimation scheme for a Global Tide and Surge Model (GTSM) showed that accurate estimation of the parameters is currently limited by the memory use of the analysis step and the computational demand. Because the estimation algorithm solver requires storage of the model output matching each observation for each parameter (or ensemble member), the requirement of memory storage gets out of control as the model simulation time increases, the model output and observation matrix become too large. The popular approach of localization does not work here because the tides propagate all over the globe in days, while parameter estimation requires weeks at least. Proper Orthogonal Decomposition (POD) is a useful technique to reduce the high dimension system with a smaller linear subspace. Singular values decomposition (SVD) is one of the methods to derive the POD modes, which is generally applied for space patterns. In this study, we focus on the application of POD in time patterns by using SVD to reduce the dimension in time patterns. As expected, the time patterns show a strong resemblance to the tidal constituents, but the same method is likely to work for a wider range of problems, which indicate that the memory requirements can be reduced dramatically by projection the model output and observations onto the time-POD patterns.

How to cite: Verlaan, M., Wang, X., and Lin, H. X.: Reducing the memory requirements of parameter estimation using model order reduction, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-5717, https://doi.org/10.5194/egusphere-egu2020-5717, 2020.

D2851 |
| Highlight
Jeffrey Anderson, Nancy Collins, Moha El Gharamti, Timothy Hoar, Kevin Raeder, Frederic Castruccio, Jingjing LIang, John Lin, James McCreight, Seongjin Noh, Brett Raczka, and Arezoo Arezoo Rfieeinasab

The Data Assimilation Research Testbed (DART) is a community facility for ensemble data assimilation developed and maintained by the National Center for Atmospheric Research (NCAR). DART provides ensemble data assimilation capabilities for NCAR community earth system models and many other prediction models. It is straightforward to add interfaces for new models and new observations to DART.

DART provides traditional ensemble data assimilation algorithms that implicitly assume Gaussianity and linearity. Traditional algorithms can still work when these assumptions are violated. However, it is possible to greatly improve results by extending ensemble algorithms to explicitly account for aspects of nonlinearity and non-Gaussianity. Two new algorithms have been added to DART. 1). Anamorphosis transforms variables to make the assimilation problem more linear and Gaussian before transforming posterior estimates back to the original model variables; 2). The marginal correction rank histogram filter (MCRHF) directly represents arbitrary non-Gaussian distributions. These methods are particularly valuable for data assimilation for bounded quantities like tracers or streamflow.

DART is being applied to a number of novel applications. Examples in the poster include 1). An eddy-resolving global ocean ensemble reanalysis with the POP ocean model and an ensemble optimal interpolation; 2). The WRF-Hydro/DART system now includes a multi-parametric ensemble, anamorphosis, and spatially-correlated noise for the forcing fields. 3). Results from the Carbon Monitoring System over Mountains using CLM5 to assimilate remotely-sensed observations (LAI, biomass, and SIF) for a field site in Colorado; 4). Assimilation of MODIS snow cover fraction and daily GRACE total water storage data and its impact on soil moisture using the DART/NOAH-MP system. 5). An ensemble atmospheric reanalysis using the CAM general circulation model.

How to cite: Anderson, J., Collins, N., El Gharamti, M., Hoar, T., Raeder, K., Castruccio, F., LIang, J., Lin, J., McCreight, J., Noh, S., Raczka, B., and Arezoo Rfieeinasab, A.: The Data Assimilation Research Testbed: Nonlinear Algorithms and Novel Applications for Community Ensemble Data Assimilation, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-3128, https://doi.org/10.5194/egusphere-egu2020-3128, 2020.

D2852 |
Mariusz Pagowski, Cory Martin, Bo Huang, Daryl Kleist, and Shobha Kondragunta

In 2016 NOAA chose the FV3 (Finite Volume) dynamical core as a basis for its future global modeling system. For aerosol modeling this dynamical core was supplemented with GFS (Global Forecast System) physics and coupled through an interface with GOCART (Goddard Global Ozone Chemistry Aerosol Radiation and Transport) parameterization. The assimilation methodology relies on a hybrid variational-ensemble approach within the newly developed model-agnostic JEDI (Joint Effort for Data assimilation Integration) framework. Observations include 550 nm AOD retrievals from VIIRS (Visible Infrared Imaging Radiometer Suite) instruments on polar-orbiting  SNPP and NOAA-20 satellites. The system is under development and early its results are compared with NASA'a MERRA-2 and ECMWF's CAMSiRA reanalyses.  


How to cite: Pagowski, M., Martin, C., Huang, B., Kleist, D., and Kondragunta, S.: Development of Ensemble-based Assimilation System for Aerosol Forecasting and Reanalysis at NOAA, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-6121, https://doi.org/10.5194/egusphere-egu2020-6121, 2020.

D2853 |
Xiaojing Li and Youmin Tang

In this study, the predictability of the Madden-Julian Oscillation (MJO) is investigated using the coupled Community Earth System Model (CESM) and the climatically relevant singular vector (CSV) method. The CSV method is an ensemble-based strategy to calculate the optimal growth of the initial error on the climate scale. We focus on the CSV analysis of MJO initialized at phase II, facilitating the investigation of the effect of the initial errors of the sea surface temperature (SST) in the Indian Ocean on it. Six different MJO events are chosen as the study cases to ensure the robustness of the results.

The results indicate that for all the study cases, the optimal perturbation structure of the SST, denoted by the leading mode of the singular vectors (SVs), is a meridional dipole-like pattern between the Bay of Bengal and the southern central Indian Ocean. The MJO signal tends to be more converged and significant in the Eastern Hemisphere while the model is perturbed by leading SV. The moist static energy analysis results indicate that the eastward propagation is much more evident in the terms of vertical advection and radiation flux than others. Therefore, the SV perturbation can strengthen and converge the MJO signal mostly by increasing the vertical advection of the moist static energy.

Further, the sensitivity studies indicate that the structure of the leading SV is not sensitive to the initial states, which suggests that we might not need to calculate SVs for each initial time in constructing the ensemble prediction, significantly saving computational time in the operational forecast systems.

How to cite: Li, X. and Tang, Y.: Optimal Error Analysis of MJO Prediction Associated with Uncertainties in Sea Surface Temperature over Indian Ocean, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-6366, https://doi.org/10.5194/egusphere-egu2020-6366, 2020.

D2854 |
Ondřej Tichý and Václav Šmídl
The basic linear inverse problem of atmospheric release can be formulated as y = M x + e , where y is the measurement vector which is typically in the form of gamma dose rates or concentrations, M is the source-receptor-sensitivity (SRS) matrix, x is the unknown source term to be estimated, and e is the model residue. The SRS matrix M is computed using an atmospheric transport model coupled with meteorological reanalyses. The inverse problem is typically ill-conditioned due to number of uncertainties, hence, the estimation of the source term is not straightforward and additional information, e.g. in the form of regularization or the prior source term, is often needed. Besides, traditional techniques rely on assumption that the SRS matrix is correct which is not realistic due to the number of approximations made during its computation. Therefore, we propose relaxation of the inverse model using introduction of the term ΔM such as y = ( M+ ΔM ) x + e leading to non-linear inverse problem formulation, where ΔM can be, as an example, parametric perturbation of the SRS matrix M in the spatial or temporal domain. We estimate parameters of this perturbation together with solving the inverse problem using variational Bayes procedure. The method will be validated on synthetic dataset as well as demonstrated on real case scenario such as the controlled tracer experiment ETEX or episode of ruthenium-106 release over the Europe on the fall of 2017.

How to cite: Tichý, O. and Šmídl, V.: Towards non-linear inverse problem for atmospheric source term determination, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-15879, https://doi.org/10.5194/egusphere-egu2020-15879, 2020.

D2855 |
Andres Yarce, Santiago Lopez, Diego Acosta, Olga Lucia Quintero, Nicolas Pinel, Arjo Segers, and Arnold Heemink

Chemical Transport Models (CTMs) simulate the emission, transformation, and transport of atmospheric chemical species, providing concentration and deposition estimates. While greatly sophisticated, these are still imperfect representations of reality. Data Assimilation (DA), a technique whereby observations are integrated into the simulations, helps alleviate the models' weaknesses, improving their simulation outputs and enabling parameter and state estimation. The variational DA method is an efficient approach for large-scale parameter and state estimation, but it is not straightforward to implement due to the need for a tangent linear matrix of the adjoint model forecast operator. To circumvent this difficulty, the ensemble-based 4DEnVar DA technique was used in this work.

Daily NO2 observations from the TROPOspheric Monitoring Instrument (TROPOMI) at resolutions of 3x5 km were acquired for 2019 and assimilated into the LOTOS-EUROS CTM. Due to the scarcity of ground-based monitoring stations for atmospheric gases in Colombia, especially outside urban areas, satellite data provide an attractive alternative for DA.

The 4DEnVar DA was first evaluated via the Design of Experiments (DOE) methodology with the Lorenz96 model assimilating synthetic data. Different parameters were changed (ensemble number, spread, forcing factor and width of the assimilation time window) according to a complete 24 factorial design followed by a Box Behnken design, providing an empirical model that guided the selection about how to modify those tuning parameters. The evaluation criteria used to test the 4DEnVar DA performance was the Root-Mean-Square (RMS) error between the analysis step and the synthetic data. Once this methodology was implemented, it was scaled up to the high-dimensional LOTOS-EUROS experiment.

The setup for the LOTOS-EUROS DA experiment was simplified in terms of domain area, chemical species of interest, dominant dynamics and considerations about how to perturb the parameters or initial conditions. A range of ensemble-members generated from perturbed parameters or input initial states were studied in conjunction with ensemble inflation experiments and Singular Value Decomposition projections, characterizing the degeneracy of the Gaussian assumption through the time propagation of the ensemble. Additionally, a complimentary analysis of this Gaussian ensemble degeneration was performed using the Shapiro-Wilk and Kolmogorov-Smirnov normality tests, which permitted a rational selection of the spin-up time of the model before the start of the assimilation window and the DA window size.

The assimilation of satellite NO2 observations into LOTOS-EUROS made possible the estimation of parameters and states. Before the DA, the non-assimilated model overestimated the magnitude of the observation, this technique improves the simulation in the sense that the analysis result approaches the observations reducing the RMS. Through this methodology, it was possible to circumvent the absence of an adjoint model associated with the chemical components of this CTM. To our knowledge, this is the first application of ensemble variational DA on a CTM for the Northwestern South America region.

How to cite: Yarce, A., Lopez, S., Acosta, D., Quintero, O. L., Pinel, N., Segers, A., and Heemink, A.: LOTOS-EUROS 4DEnVar Data Assimilation using TROPOMI data for Colombia, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-18771, https://doi.org/10.5194/egusphere-egu2020-18771, 2020.

D2856 |
Alison Fowler, Jozef Skákala, and Stefano Ciavatta

Monitoring biogeochemistry in shelf seas is of great significance for the economy, ecosystems understanding and climate studies. Data assimilation can aid the realism of marine biogeochemistry models by incorporating information from observations. An important source of information about phytoplankton groups and total chlorophyll is available from the ESA OC-CCI (ocean colour - climate change initiative) dataset.

For any assimilation system to be successful it is important to accurately represent all sources of data uncertainty. For the ocean colour product, the propagation of errors throughout the ocean colour algorithm makes the characterisation of the uncertainty challenging. However, the problem can be simplified by assuming that the uncertainty is a function of optical water type (OWT), which characterises the water column of each observed pixel in terms of their reflectance properties.

Within this work we apply the well-known Desroziers et al. (2005) consistency diagnostics to the Met Office’s NEMOVAR 3D-VAR DA system used to create daily biogeochemistry forecasts on the North-West European Shelf. The derived estimates of monthly ocean colour error covariances stratified by OWT are compared to previously derived estimates of the root mean square errors and biases using in-situ data match ups (Brewin et al. 2017). It is found that the agreement between the two estimates of the error variances have a strong seasonal and OWT dependence. The error correlations (which can only be estimated with the Desroziers’ method) in some instances are found to be significant out to a few 100km particularly for more turbid waters during the spring bloom. The reliability and limitation of these two estimates of the ocean colour uncertainty are discussed along with the implications for the future assimilation of ocean colour products and for ecosystem and climate studies.

How to cite: Fowler, A., Skákala, J., and Ciavatta, S.: Quantifying uncertainty in the ESA Ocean Colour – Climate Change Initiative dataset for assimilation of total chlorophyll and phytoplankton functional types, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-18684, https://doi.org/10.5194/egusphere-egu2020-18684, 2020.

D2857 |
William Crawford, Sergey Frolov, Justin McLay, Carolyn Reynolds, Craig Bishop, Benjamin Ruston, and Neil Barton

The presented work will illustrate the impact of analysis correction based additive inflation (ACAI) on atmospheric forecasts. ACAI uses analysis corrections from the NAVGEM data assimilation system as a representation of model error and is shown to simultaneously improve ensemble spread-skill, reduce model bias and improve the RMS error in the ensemble mean. Results are presented from a myriad of experiments exercising ACAI in stand-alone NAVGEM forecasts using two different ensemble systems; (1) the current operational EPS at FNMOC based on the ensemble transform method and (2) the Navy-ESPC EPS based on perturbed observations. The method of relaxation-to-prior-perturbations (RTPP) has also been implemented in the Navy-ESPC EPS and is shown to further improve the ensemble spread-skill relationship by allowing variance generated during the forecast to impact the initial-time ensemble variance in the subsequent cycle. Results from a simplified implementation of ACAI in the NAVGEM deterministic system will also be shown and indicate positive impact to model biases and RMSE.

How to cite: Crawford, W., Frolov, S., McLay, J., Reynolds, C., Bishop, C., Ruston, B., and Barton, N.: Accounting for model error in atmospheric forecasts, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-20271, https://doi.org/10.5194/egusphere-egu2020-20271, 2020.

D2858 |
Yvonne Ruckstuhl and Tijana Janjic

We investigate the feasibility of addressing model error by perturbing and  estimating uncertain static model parameters using the localized ensemble transform Kalman filter. In particular we use the augmented state approach, where parameters are updated by observations via their correlation with observed state variables. This online approach offers a flexible, yet consistent way to better fit model variables affected by the chosen parameters to observations, while ensuring feasible model states. We show in a nearly-operational convection-permitting configuration that the prediction of clouds and precipitation with the COSMO-DE model is improved if the two dimensional roughness length parameter is estimated with the augmented state approach. Here, the targeted model error is the roughness length itself and the surface fluxes, which influence the initiation of convection. At analysis time, Gaussian noise with a specified correlation matrix is added to the roughness length to regulate the parameter spread. In the northern part of the COSMO-DE domain, where the terrain is mostly flat and assimilated surface wind measurements are dense, estimating the roughness length led to improved forecasts of up to six hours of clouds and precipitation. In the southern part of the domain, the parameter estimation was detrimental unless the correlation length scale of the Gaussian noise that is added to the roughness length is increased. The impact of the parameter estimation was found to be larger when synoptic forcing is weak and the model output is more sensitive to the roughness length.

How to cite: Ruckstuhl, Y. and Janjic, T.: Combined state-parameter estimation with the LETKF for convective-scale weather forecasting, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-7163, https://doi.org/10.5194/egusphere-egu2020-7163, 2020.

D2859 |
Marc Bocquet, Julien Brajard, Alberto Carrassi, and Laurent Bertino

The reconstruction from observations of the dynamics of high-dimensional chaotic models such as geophysical fluids is hampered by (i) the inevitably partial and noisy observations that can realistically be obtained, (ii) the need and difficulty to learn from long time series of data, and (iii) the unstable nature of the dynamics. To achieve such inference from the observations over long time series, it has recently been suggested to combine data assimilation and machine learning in several ways. We first rigorously show how to unify these approaches from a Bayesian perspective, yielding a non-trivial loss function.

Existing techniques to optimize the loss function (or simplified variants thereof) are re-interpreted here as coordinate descent schemes. The expectation-maximization (EM) method is used to estimate jointly the most likely model and model error statistics. The main algorithm alternates two steps: first, a posterior ensemble is derived using a traditional data assimilation step using an ensemble Kalman smoother (EnKS); second, both the surrogate model and the model error are updated using machine learning tools, a quasi-Newton optimizer, and analytical formula. In our case, the spatially extended surrogate model is formalized as a neural network with convolutional layers leveraging on the locality of the dynamics.

This scheme has been successfully tested on two low-order chaotic models with distinct identifiability, namely the 40-variable and the two-scale Lorenz models. Additionally, an approximate algorithm is tested to mitigate the numerical cost, yielding similar performances. Using indicators that probe short-term and asymptotic properties of the surrogate model, we investigate the sensitivity of the inference to the length of the training window, to the observation error magnitude, to the density of the monitoring network, and to the lag of the EnKS. In these iterative schemes, model error statistics are automatically adjusted to the improvement of the surrogate model dynamics. The outcome of the minimization is not only a deterministic surrogate model but also its associated stochastic correction, representative of the uncertainty attached to the deterministic part and which accounts for residual model errors.

How to cite: Bocquet, M., Brajard, J., Carrassi, A., and Bertino, L.: Bayesian inference of dynamics from partial and noisy observations using data assimilation and machine learning, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-15517, https://doi.org/10.5194/egusphere-egu2020-15517, 2020.

D2860 |
Jose M Gonzalez-Ondina, Lewis Sampson, and Georgy Shapiro

Current operational ocean modelling systems often use variational data assimilation (DA) to improve the skill of the ocean predictions by combining the numerical model with observational data. Many modern methods are derivatives of objective (optimal) interpolation techniques developed by L. S. Gandin in the 1950s, which requires computation of the background error covariance matrix (BECM), and much research has been devoted into overcoming the difficulties surrounding its calculation and improving its accuracy. In practice, due to time and memory constraints, the BECM is never fully computed. Instead, a simplified model is used, where the correlation at each point is modelled using a simple function while the variance and length scales are computed using error estimation methods such as the Hollingsworth-Lonnberg  or the NMC (National Meteorological Centre). Usually, the correlation is assumed to be horizontally isotropic, or to have a predefined anisotropy based on latitude. However, observations indicate that horizontal diffusion is sometimes anisotropic, hence this has to be propagated into BECM. It is suggested that including these anisotropies would improve the accuracy of the model predictions.

We present a new method to compute the BECM which allows to extract horizontal anisotropic components from observational data. Our method, unlike current techniques, is fundamentally multidimensional and can be applied to 2D or 3D sets of un-binned data. It also works better than other methods when observations are sparse, so there is no penalty when trying to extract the additional anisotropic components from the data.

Data Assimilation tools like NEMOVar use a matrix decomposition technique for the BECM in order to minimise the cost function. Our method is well suited to work with this type of decomposition, producing the different components of the decomposition which can be readily used by NEMOVar.

We have been able to show the spatial stability of our method to quantify anisotropy in areas of sparse observations. While also demonstrating the importance of including anisotropic representation within the background error. Using the coastal regions of the Arabian Sea, it is possible to analyse where improvements to diffusion can be included. Further extensions of this method could lead to a fully anisotropic diffusion operator for the calculation of BECM in NEMOVar. However further testing and optimization are needed to correctly implement this into operational assimilation systems.

How to cite: Gonzalez-Ondina, J. M., Sampson, L., and Shapiro, G.: A new method for computing horizontally anisotropic background error covariance matrices for data assimilation in ocean models. , EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-2686, https://doi.org/10.5194/egusphere-egu2020-2686, 2020.

D2861 |
Ziqing Zu, Xueming Zhu, and Hui Wang

Based on ROMS and Ensemble Optimal Interpolation (EnOI) method, the South China Sea operational Oceanography Forecasting System (SCSOFS) is implemented in National Marine Environmental Forecasting Center (NMEFC), to provide the forecast of the currents, temperature and salinity in South China Sea for the future 5 days. Recently, a systematic modification has been carried out to SCSOFS to improve its forecast skill.

For the data assimilation system, new methods have been implemented, such as using Increment Analysis Update (IAU) and First Guess at Appropriate Time (FGAT), using a high-pass filter to evaluate the background error, assimilating multi-source observations, using non-uniform localization radius. In addition, the respective contribution of each method will also be discussed.

An optimization system is implemented for evaluating the values of physical parameters in ROMS, to remove the long-term bias of simulation. Argo temperature profiles is assimilated in the first half of 2017, to obtain the optimal coefficients of horizontal/vertical viscosity/diffusion and linear bottom drag. An independent validation from July of 2017 to December of 2018 shows that the simulation is improved using the optimal values.

How to cite: Zu, Z., Zhu, X., and Wang, H.: On the modification of operational oceanography forecasting system for South China Sea in National Marine Environmental Forecasting Center of China, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-6241, https://doi.org/10.5194/egusphere-egu2020-6241, 2020.

D2862 |
Youmin Tang and Yaling Wu

In this study, we developed a flow-dependent ensemble-based targeted observation method, by minimizing the analysis error variance under the framework of Ensemble Kalman filter (EnKF) data assimilation system. This method estimates the background error statistics as a flow dependent function. The covariance localization is also introduced for computing efficiency and alleviating the spurious observations.  As a test bed, an  optimal observation array of sea level anomalies (SLA) is designed for its seasonal prediction over the tropical Indian Ocean (TIO) region.  Furthermore, the observing system simulation experiments (OSSEs) is used to verify the resultant optimal observational array using our recently developed coupled data assimilation system. A comparison between this flow-dependent method and the traditional method is also given. ​

How to cite: Tang, Y. and Wu, Y.: A Flow-dependent Targeted Observation Method , EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-6596, https://doi.org/10.5194/egusphere-egu2020-6596, 2020.