EGU24-9156, updated on 08 Mar 2024
https://doi.org/10.5194/egusphere-egu24-9156
EGU General Assembly 2024
© Author(s) 2024. This work is distributed under
the Creative Commons Attribution 4.0 License.

DataLabs: development of a cloud collaborative platform for open interdisciplinary geo-environmental sciences 

Michael Tso1, Michael Hollaway1, Faiza Samreen1, Iain Walmsley1, Matthew Fry2, John Watkins1, and Gordon Blair1
Michael Tso et al.
  • 1UK Centre for Ecology and Hydrology, Environmental Data Science, United Kingdom of Great Britain (mtso@ceh.ac.uk)
  • 2UK Centre for Ecology and Hydrology, Water Resources Systems, United Kingdom of Great Britain

In environmental science, scientists and practitioners are increasingly facing the need to create data-driven solutions to the environment's grand challenges, often needing to use data from disparate sources and advanced analytical methods, as well as drawing expertise from collaborative and cross-disciplinary teams [1]. Virtual labs allow scientists to collaboratively explore large or heterogeneous datasets, develop and share methods, and communicate their results to stakeholders and decision-makers. 

DataLabs [2] has been developed as a cloud-based collaborative platform to tackle these challenges and promote open, collaborative, interdisciplinary geo-environmental sciences. It allows users to share notebooks (e.g. JupyterLab, R Studio, and most recently VS Code), datasets and computational environments and promote transparency and end-to-end reasoning of model uncertainty. It supports FAIR access to data and digital assets by providing shared data stores and discovery functionality of datasets and assets hosted on the platform’s asset catalogue. Its tailorable design allows it to be adaptable to different challenges and applications. It is also an excellent platform for large collaborative teams to work on outputs together [3] as well as communicating results to stakeholders by allowing easy prototyping and publishing of web applications (e.g. Shiny, Panel, Voila). It is currently deployed on JASMIN [4] and is part of the UK NERC Environmental data service [5]. 

There are a growing number of use cases and requirements for DataLabs and it is going to play a central part in several planned digital research infrastructure (DRI) initiatives. Future development needs of the platform to further its vision include e.g. more intuitive onboarding experience, easier access to key datasets at source, better connectivity to other cloud platforms, and better use of workflow tools. DataLabs shares many of the features (e.g. heavy use of PANGEO core packages) and design principles of PANGEO. We would be interested in exploring commonalities and differences, sharing best practices, and growing the community of practice in this increasingly important area. 

[1]  Blair, G.S., Henrys, P., Leeson, A., Watkins, J., Eastoe, E., Jarvis, S., Young, P.J., 2019. Data Science of the Natural Environment: A Research Roadmap. Front. Environ. Sci. 7. https://doi.org/10.3389/fenvs.2019.00121  

[2] Hollaway, M.J., Dean, G., Blair, G.S., Brown, M., Henrys, P.A., Watkins, J., 2020. Tackling the Challenges of 21st-Century Open Science and Beyond: A Data Science Lab Approach. Patterns 1, 100103. https://doi.org/10.1016/j.patter.2020.100103 

[3] https://eds.ukri.org/news/impacts/datalabs-streamlines-workflow-assessing-state-nature-uk  

[4] https://jasmin.ac.uk/  

[5] https://eds.ukri.org/news/impacts/datalabs-digital-collaborative-platform-tackling-environmental-science-challenges  

How to cite: Tso, M., Hollaway, M., Samreen, F., Walmsley, I., Fry, M., Watkins, J., and Blair, G.: DataLabs: development of a cloud collaborative platform for open interdisciplinary geo-environmental sciences , EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-9156, https://doi.org/10.5194/egusphere-egu24-9156, 2024.