EGU23-2206
https://doi.org/10.5194/egusphere-egu23-2206
EGU General Assembly 2023
© Author(s) 2023. This work is distributed under
the Creative Commons Attribution 4.0 License.

A novel machine learning national model for diffuse source total phosphorus concentrations in streams

Brian Kronvang, Jørgen Windolf, Henrik Tornberg, Jonas Rolighed, and Søren Larsen
Brian Kronvang et al.
  • Department of Ecoscience, Aarhus University, C.F. Møllers Allé 3, 8000 Aarhus, Denmark (bkr@ecos.au.dk)

Data on the diffuse source annual flow weighted total phosphorus (TP) concentrations from 349 Danish streams draining smaller catchments (< 50 km2) for the period 1990-2019 were used for developing a model in machine learning software (DataRobot version 6.2; DataRobot Inc. Boston MA, USA). The developed diffuse source TP-concentration model will substitute an older model that have been in place to calculate P-loadings to Danish estuaries from ungauged areas. A total of 207 streams with 3,144 annual observations of flow-weighted TP concentrations together with information on 19 explanatory variables was entered into the DataRobot software. DataRobot divides the input data into three layers: Training dataset (64%), validation dataset (16%) and hold out dataset (20%). Thereafter, DataRobot conducts a five-layer cross-validation and tests among 72 different model types before suggesting final best solutions.

In this case, the TP-concentration model was developed as an ‘eXtreme Gradient Boosted Trees Regressor with early stopping’ as suggested by the DataRobot software to be superior for modelling the annual flow-weighted TP concentration based on 13 explanatory variables. The most influencing explanatory variables in the final model are: 1) tile drainage in the catchments; 2) ; 3) period (two periods with different sampling regimes; 4) proportion of agricultural land; 5) importance of bank erosion; 6) deviation of annual runoff from long-term mean. The final TP-concentration model has a R2=0.69 for the training dataset, R2 = 0.71 for the validation dataset and R2 = 0.67 for the hold out dataset.

A validation of the new machine learning TP-concentration model on 142 independent streams with 1,261 annual observations was conducted to investigate the uncertainty of the model simulations. The validation showed the TP-concentration model to have a high explanatory power (R2=0.60) and with a very good simulation performance in the nine Danish georegions, as well as for the 30 year long time series of data. 

An application of the model for calculating flow-weighted TP-concentrations within nearly 3,200 catchment polygons (ID15’s) covering the Danish land area showed that the new developed machine learning TP-model is a valuable tool both for calculation of TP-loadings from ungauged areas to lakes and coastal waters as well as for linking catchment pressures to stream ecological status.   

How to cite: Kronvang, B., Windolf, J., Tornberg, H., Rolighed, J., and Larsen, S.: A novel machine learning national model for diffuse source total phosphorus concentrations in streams, EGU General Assembly 2023, Vienna, Austria, 24–28 Apr 2023, EGU23-2206, https://doi.org/10.5194/egusphere-egu23-2206, 2023.