- BRGM, France (c.bejjit@brgm.fr)
The assessment of environmental and resource performance of energy transition technologies relies on quantitative information scattered across heterogeneous sources, including scientific articles, patents, and industrial reports, such as ESG (Environmental, Social, and Governance) disclosures. These documents contain key data for Life Cycle Inventory (LCI) and Material Flow Analysis (MFA), such as material and energy intensities, water consumption, mining production volumes, emissions, and technological descriptors. However, this information is predominantly embedded in unstructured PDF documents optimized for human reading, making large-scale, traceable data aggregation difficult and costly when performed manually.
This work presents an automated and modular methodology designed to extract and contextualize quantitative LCI and MFA data from three major categories of technical documentation. The approach combines large-scale document collection, relevance screening, and multimodal artificial intelligence within a reproducible and auditable workflow.
- Scientific Articles
Peer-reviewed articles are collected through automated scraping workflows based on structured search outputs. Documents are screened for LCI/MFA relevance using domain-specific keywords, methodological markers, and quantitative signal density. Relevant articles are then processed using a multimodal AI-based extraction core in which each page is analyzed through a combined text and image input. This enables robust extraction of numerical values from tables, text and figures while preserving contextual information such as units, methodological assumptions, and source location.
- Patents
Patent documents contain information about future trends on technologies and metal uses. Patents are collected via dedicated scraping pipelines and processed separately from scientific articles. The workflow focuses on extracting and structuring patent metadata, including publication year, country, and technology class, in order to characterize technological activity related to energy transition technologies. While quantitative LCI/MFA extraction from patents is not yet systematically performed, the pipeline enables descriptive statistical analyses of patent dynamics, including temporal trends and geographical patterns of technological development.
- Mining technical and ESG Reports
Official mining companies reports, with a specific focus on ESG ones, are processed through a screening module acting as a gatekeeper. The screening relies on sequential text parsing and, when necessary, geometric reconstruction of tables to identify reports containing sufficiently granular and structured quantitative information. Following human validation of the screening results, selected reports are analyzed using a IA-multimodal vision–language model combining page images and extracted text, enabling structured extraction of industrial metrics with associated context and traceability.
This automated methodology addresses one of the core challenges of data collection and significantly improves the granularity, consistency, and verifiability of LCI datasets and MFA inputs. The application of methodology is illustrated through examples related to battery and hydrogen technologies based on scientific articles and patents, and through case studies on copper and nickel production with a focus on mining based on industrial report. Although applied for LCA and MFA, the approach can also support the extraction of other types of data and indicators relevant to environmental and resource analyses. The tool provides automated and reliable support for researchers aiming to extract comprehensive foundational data from heterogeneous sources.
How to cite: Bejjit, C. E., Monfort, D., Muller, S., Lai, F., Beylot, A., and Hennioui, D.: Mining and raw materials sector: Automated Data Extraction and Contextualization for Life Cycle Inventory (LCI) and Material Flow Analysis (MFA) Across Scientific Articles, Patents and Mining companies Reports, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-6501, https://doi.org/10.5194/egusphere-egu26-6501, 2026.