EGU24-5002, updated on 08 Mar 2024
https://doi.org/10.5194/egusphere-egu24-5002
EGU General Assembly 2024
© Author(s) 2024. This work is distributed under
the Creative Commons Attribution 4.0 License.

Geochemistry π: Automated Machine Learning Python Framework for Tabular Data

Jianming Zhao1, Johnny ZhangZhou1, Can He2, Yang Lyu1, and the ZJU Earth Data Group*
Jianming Zhao et al.
  • 1School of Earth Sciences, Zhejiang university, China (zhangzhou333@zju.edu.cn)
  • 2School of Computing, National University of Singapore, Singapore
  • *A full list of authors appears at the end of the abstract

Machine learning has significantly advanced geochemistry research, but its implementation can be arduous and time-consuming. In response to this challenge, we introduce Geochemistry π, an open-source automated machine learning Python framework. With Geochemistry π, geochemists can effortlessly process tabulated data and execute machine learning algorithms by selecting preferred options. This streamlined process operates in a user-friendly question-and-answer format, eliminating the need for coding expertise. Following automatic or manual parameter adjustment, Geochemistry π furnishes users with comprehensive performance metrics and predictive outcomes for their machine learning models. Leveraging the scikit-learn library, Geochemistry π has developed a tailored automated workflow encompassing classification, regression, dimensionality reduction, and clustering algorithms. The framework’s extensibility and portability are enhanced through a modular pipeline architecture, segregating data handling from algorithm application. Geochemistry π’s Auto Machine Learning module integrates Cost-Frugal Optimization and Blended Search Strategy hyperparameter search methods from the A Fast and Lightweight Auto Machine Learning Library. Additionally, model parameter optimization is expedited using the Ray distributed computing framework. Efficient machine learning lifecycle management is facilitated through integration with the MLflow library, allowing users to compare multiple trained models at various scales and manage generated data and visualizations. To enhance accessibility, Geochemistry π separates front-end and back-end frameworks, culminating in a user-friendly web portal. This portal not only showcases the machine learning model but also presents the data science workflow, making it accessible to both researchers and developers. In summary, Geochemistry π offers a robust Python framework that empowers users and developers to significantly enhance their data mining efficiency, with options for both online and offline operation.

ZJU Earth Data Group:

Jianming Zhao, J ZhangZhou, Can He, Yang Lyu, et al.

How to cite: Zhao, J., ZhangZhou, J., He, C., and Lyu, Y. and the ZJU Earth Data Group: Geochemistry π: Automated Machine Learning Python Framework for Tabular Data, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-5002, https://doi.org/10.5194/egusphere-egu24-5002, 2024.