EGU21-2293, updated on 21 Apr 2021
https://doi.org/10.5194/egusphere-egu21-2293
EGU General Assembly 2021
© Author(s) 2021. This work is distributed under
the Creative Commons Attribution 4.0 License.

Integration of long-term collocated ecological datasets: examples from the UK Environmental Change Network (ECN)

Chak-Hau Michael Tso1,4, Aaron Lowther2, Don Monteith1, Linsay Flynn Banin1,4, William Simm3, Susannah Rennie1,4, Michael Hollaway1,4, Peter Henrys1,4, Rebecca Killick1,4, John Watkins1,4, and Gordon S. Blair4,3
Chak-Hau Michael Tso et al.
  • 1UK Centre for Ecology and Hydrology, Environmental data science, United Kingdom of Great Britain – England, Scotland, Wales (mtso@ceh.ac.uk)
  • 2Department of Mathematics & Statistics, Lancaster University, Lancaster, UK
  • 3School of Computing and Communications, Lancaster University, Lancaster, United Kingdom
  • 4Centre of Excellence for Environmental Data Science, Lancaster University and UK Centre for Ecology and Hydrology, Lancaster, United Kingdom

It is increasingly recognized that a whole-system approach is needed to address many challenging environmental research questions. While the whole-system approach is increasingly adopted by integrating data and models from various sub-systems, the ambition to apply this approach more widely across the environmental sciences requires infrastructure, methodologies, and a culture shift in order to facilitate seamless collaboration and re-deployment of workflows. 

We report our recent progress in addressing some of these issues. We focus our examples here on work related to the UK Environmental Change Network (ECN, an eLTER member network). A transdisciplinary project team comprised of environmental scientists, statisticians, and computer scientists collaborated through the medium of a virtual research platform (DataLabs). Within the DataLabs platform, all data and analysis code are centrally stored via a cloud service and easily accessible via an internet browser from any operating system. Access to cloud computing resources for analyses are also available. More importantly, all users have access to the same versions of the data and software running on the same hardware throughout the collaboration process.

Such close collaboration allows us to co-develop statistical/data science algorithms that are suitable for a wide range of environmental data. These algorithms are not domain-specific and are generic enough to be used on any environmental datasets. Here we demonstrate how they are used to highlight periods of data with significant change. The first example is a "state tagging" algorithm, where each point in time of a dataset is classified as belonging to an arbitrary state based on clustering of covariates. Subsequently, confidence intervals, based on the statistics of each state, are computed and any data points that lie outside the confidence intervals are flagged for further investigation. A second example is the development of an algorithm for the identification of changepoints across multiple time series comprising different sampling frequencies or misaligned sampling times.  Existing multivariate changepoint algorithms assume that each time series is sampled at the same time (a situation not commonly applicable to environmental data). Our method removes this assumption, and emerged after consultation and collaboration with domain scientists. It has many potential applications, such as confirming whether changepoints occur across sites or across multiple variables within sites, or combinations thereof. In the final example, we show how DataLabs can facilitate the acquisition and application of third-party data to improve understanding of ECN atmospheric deposition chemistry data. Specifically, it allows users to take advantage of cloud computing and storage and collaborate seamlessly; where each collaborator is not required to have independent versions of software and data, saving time and effort. 

The developments reported herein highlight the benefits of collaborative research using DataLabs to advance the integration of data, models, and methods across the environmental sciences. It provides the infrastructure, data, and culture to allow scientists to work more closely together. This in turn allows rapid incorporation of novel data science methods. It also allows the data integration workflows developed to be more readily applied elsewhere, while stakeholders can view and manipulate resultant data products.

 

How to cite: Tso, C.-H. M., Lowther, A., Monteith, D., Banin, L. F., Simm, W., Rennie, S., Hollaway, M., Henrys, P., Killick, R., Watkins, J., and Blair, G. S.: Integration of long-term collocated ecological datasets: examples from the UK Environmental Change Network (ECN), EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-2293, https://doi.org/10.5194/egusphere-egu21-2293, 2021.

Displays

Display file