Automating Requirements Elicitation using Large Language Models and Speech Processing

Zahra Fardhosseini; Andrea Ackermann; Beate Oerder

doi:https://doi.org/10.5194/egusphere-egu26-20967

[Back] [Session ESSI1.2]

EGU26-20967, updated on 14 Mar 2026

https://doi.org/10.5194/egusphere-egu26-20967

EGU General Assembly 2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

Poster | Tuesday, 05 May, 16:15–18:00 (CEST), Display time Tuesday, 05 May, 14:00–18:00

Hall X4, X4.36

Automating Requirements Elicitation using Large Language Models and Speech Processing

Zahra Fardhosseini¹, Andrea Ackermann², and Beate Oerder¹

Zahra Fardhosseini et al.

¹Johann Heinrich von Thünen-Institut, Center for Information Managment, Braunschweig, Germany (zahra.fardhosseini@thuenen.de)
²Johann Heinrich von Thünen-Institut, Institute of Rural Studies, Braunschweig, Germany (andrea.ackermann@thuenen.de)

Problem:

The elicitation of user requirements represents a critical first step in successful planning software projects, particularly when a diverse set of use cases must be considered. In practice, this process is often carried out through oral interviews and handwritten notes, which is time-consuming, error-prone, and makes structured processing and subsequent analysis difficult.

Approach:

We present an integrated, automated pipeline framework that supports the systematic collection and analysis of user requirements, from capturing user perspectives, to model-supported analysis. The goal is to gather requirements consistently across different stakeholder roles, including project lead, technical staff, and data scientists. In the first step, requirements are collected using a web-based, structured questionnaire. In the second step, the questionnaire serves as a guideline in follow-up interviews for detailed case description to further refine the requirements.

Figure 1. the pipeline for requirements elicitation, LLM-based analysis, and human-in-the-loop review.

The interviews are recorded and automatically transcribed using a domain-adapted Language Recognition Component (LRC) based on open-source Automatic Speech Recognition (ASR) models. The resulting transcripts are combined with questionnaire responses and initial analysis artifacts, such as charts and diagrams, and processed within a Large Language Model (LLM) pipeline. After requirements have been collected, the pipeline supports the systematic inspection of individual requirements and their consideration in project planning.

Using a dedicated prompt schema, the LLM-based analysis supports the identification of functional and non-functional requirements, highlights open needs, clusters related issues, and organizes the results according to relevant work contexts. A human-in-the-loop review module enables targeted corrections, quality assurance, and iterative improvement of the analysis results.

Implementation test:

To validate our end‑to‑end requirements‑engineering pipeline, we applied it to the IACS‑AI Data‑Management‑Remodeling project. A web‑based survey (Nov–Dec 2024) yielded 53 responses, giving an initial, structured view of user requirements. Subsequently, we held seven interviews with 18 participants (project managers, engineers, data‑scientists), producing > 460 min of video afterward transcribed with the freeware tool Scraibe.

Prompt‑engineering routine fed these inputs to a Large Language Model (Llama 3.3), which detected semantic clusters, classified requirements, and identified problem statements. For each requirement, we kept the highest‑probability class for further review. The resulting insight shaped the next milestone: the design and implementation of a data‑pipeline architecture that fulfills the extracted functional and non‑functional requirements.

Conclusion:

The reproducible design of the pipeline ensures traceability by documenting when, by whom, and in which context requirements were expressed, as well as how project decisions are derived from them. This results in a lightweight yet structured approach to requirements elicitation that improves transparency as well as reproducibility and reduces manual effort and errors.

Because the pipeline is generic, it is ideal for contexts with many stakeholders, heterogeneous use cases, and strong documentation‑traceability needs therefore besides our scientific implementation test it can also be utilized in the field of enterprise software, AI‑driven data projects, e‑government systems, and regulated domains such as healthcare or finance.

How to cite: Fardhosseini, Z., Ackermann, A., and Oerder, B.: Automating Requirements Elicitation using Large Language Models and Speech Processing, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-20967, https://doi.org/10.5194/egusphere-egu26-20967, 2026.