EGU25-2394, updated on 14 Mar 2025
https://doi.org/10.5194/egusphere-egu25-2394
EGU General Assembly 2025
© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.
Poster | Tuesday, 29 Apr, 14:00–15:45 (CEST), Display time Tuesday, 29 Apr, 14:00–18:00
 
Hall X5, X5.144
Machine Learning to Construct Daily, Gap-Free, Long-Term Stratospheric Trace Gases Data Sets
Sandip Dhomse1,2 and Martyn Chipperfield1,2
Sandip Dhomse and Martyn Chipperfield
  • 1University of Leeds, School of Earth and Environmental Sciences, United Kingdom of Great Britain – England, Scotland, Wales (s.s.dhomse@leeds.ac.uk)
  • 2University of Leeds, National Centre of Earth Observation, Leeds, United Kingdom of Great Britain – England, Scotland, Wales

Understanding the complex relationship between trace gases as well as undestanding various source and sink pathways in the atmsophere need good qualtity continuous and reliable datasets. However, obtaining comprehensive long-term profiles for key trace gases is a significant challenge. We have initiated a new research strand to consrtuct  long term data using machine learning. Output from a  Chemical Transport Model (CTM) and observational data from satellite instruments (such as HALOE and ACE-FTS) is merged using machine learning. This integration results in the creation of daily, gap-free datasets for six crucial gases: ozone (O3), methane (CH4), hydrogen fluoride (HF), water vapour (H2O), hydrogen chloride (HCl), and nitrous oxide (N2O) from 1991 to 2021.

Chlorofluorocarbons (CFCs) are a critical source of chlorine that controls stratospheric ozone losses. Currently, ACE-FTS is the only instrument that provides sparse but daily measurements of these gases. Monitoring changes in these ozone-depleting substances, which are now banned, helps assess the effectiveness of the Montreal Protocol. We have initiated the construction of gap-free stratospheric profile data for CFC-11 as a subsequent step.

We use a regression model to estimate the relationship between various tracers in a CTM and the differences between the CTM output field and the observations, assuming all errors are due to the CTM setup. Once the regression model is trained for observational collocations, it is used to estimate biases for all the CTM grid points. To enhance accuracy, we employed various regression models and found that XGBoost regression outperforms other methods. ACE-FTS v5.2 data (2004-present) is used to train (70%) and test (30%) the XGBoost performance.

Our results demonstrate excellent agreement between the constructed profiles and satellite measurement-based datasets. Biases in TCOM data sets, when compared to evaluation profiles, are consistently below 10% for mid-high latitudes and 50% for the low latitudes, across the stratosphere. The constructed daily zonal mean profile datasets, spanning altitudes from 15 to 60 km (or pressure levels from 300 to 0.1 hPa), are publicly accessible through Zenodo repositories.

     CH4:       https://doi.org/10.5281/zenodo.7293740   
     N2O:          https://doi.org/10.5281/zenodo.7386001
     HCl :         https://doi.org/10.5281/zenodo.7608194
     HF:        https://doi.org/10.5281/zenodo.7607564
     O3:         https://doi.org/10.5281/zenodo.7833154 
     H2O:          https://doi.org/10.5281/zenodo.7912904
     CFC-11:    https://doi.org/10.5281/zenodo.11526073  
     CFC-12:      https://doi.org/10.5281/zenodo.12548528
     COF2:        https://doi.org/10.5281/zenodo.12551268


In an upcoming iteration, we are enhancing the algorithm as well as add more species in the current setup. We believe these data sets would provide valuable insights into the dynamics of stratospheric trace gases, furthering our understanding of their behaviour and impact on the climate.

References:

Dhomse, S. S., et al.,: ML-TOMCAT: machine-learning-based satellite-corrected global stratospheric ozone profile data set from a chemical transport model, Earth Syst. Sci. Data, 13, 5711–5729, https://doi.org/10.5194/essd-13-5711-2021, 2021.

Dhomse, S. S. and Chipperfield, M. P.: Using machine learning to construct TOMCAT model and occultation measurement-based stratospheri
c methane (TCOM-CH4) and nitrous oxide (TCOM-N2O) profile data sets, Earth Syst. Sci. Data, 15, 5105–5120, https://doi.org/10.5194/essd-15-5105-2023, 2023.

How to cite: Dhomse, S. and Chipperfield, M.: Machine Learning to Construct Daily, Gap-Free, Long-Term Stratospheric Trace Gases Data Sets, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-2394, https://doi.org/10.5194/egusphere-egu25-2394, 2025.