Using artificial intelligence to automate and expedite the harmonization of environmental data

Tyler Karns; Cedric Hagen; Krutika Deshpande; Michael SanClements; Christine Laney; Benjamin Ruddell; Henry Loescher; Tyson Swetnam

doi:https://doi.org/10.5194/egusphere-egu26-12357

[Back] [Session ESSI2.5]

EGU26-12357, updated on 14 Mar 2026

https://doi.org/10.5194/egusphere-egu26-12357

EGU General Assembly 2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

Using artificial intelligence to automate and expedite the harmonization of environmental data

Tyler Karns¹, Cedric Hagen², Krutika Deshpande³, Michael SanClements², Christine Laney², Benjamin Ruddell³, Henry Loescher², and Tyson Swetnam⁴

Tyler Karns et al.

¹AS&T, Battelle Memorial Institute, United States of America
²National Ecological Observatory Network, Battelle Memorial Institute, United States of America
³School of Informatics, Computing & Cyber Systems, Northern Arizona University, United States of America
⁴CyVerse, University of Arizona, United States of America

Data harmonization–the process of unifying disparate datasets into compatible formats and comparable units–is critical for global environmental research but remains prohibitively time-consuming and expensive. While many global environmental datasets could be assembled from existing available data, potentially offering transformative insight in pressing environmental issues, the exhaustive efforts to harmonize data is currently unfeasible for most scientific funding cycles. For example, cross-network studies (such as those between the U.S. National Ecological Observatory Network (NEON), the European Integrated Carbon Observation System (ICOS), and the Australian Terrestrial Ecosystem Research Network (TERN)) requires weeks-to-years of manual schema mapping, unit conversions, alignment, quality flag standardization for even a small number of data products, and more effort needed before any analyses can begin. Here, we present a large language model (LLM)-based agentic system designed to automate many of these data harmonization steps by leveraging semantic understanding of scientific metadata and documentation. This system is designed to ingest raw datasets and metadata, interpret variable semantics within scientific contexts, and generate tailored transformation pipelines. We tune this approach using a subset of previously manually harmonized environmental data from NEON, ICOS, and TERN, as well as the South African Environmental Observation Network (SAEON) and the Integrated European Long-Term Ecosystem, Critical Zone and Socio-Ecological Research Infrastructure (eLTER), as part of an effort by the Global Ecosystem Research Infrastructure (GERI) to build globally harmonized ecological drought datasets. Using these harmonized ecological drought datasets from across the globe, we test the efficacy of this LLM-based agentic system measuring accuracy, time/labor efficiencies, and data integrity preservation as compared to manual data harmonization workflows. Pressing global environmental challenges require rapid synthesis of global environmental data. By reducing data harmonization time from months to hours, these artificial intelligence (AI) tools will enable scientists to focus on analysis and modeling rather than data wrangling, ultimately accelerating research in these critical areas of global environmental science.

How to cite: Karns, T., Hagen, C., Deshpande, K., SanClements, M., Laney, C., Ruddell, B., Loescher, H., and Swetnam, T.: Using artificial intelligence to automate and expedite the harmonization of environmental data, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-12357, https://doi.org/10.5194/egusphere-egu26-12357, 2026.