Designing an LLM-Based Sustainability Assistant Leveraging Structured and Unstructured ESG Data
- Type:Master's thesis
- Supervisor:
Introduction
Large Language Models (LLMs) have the potential to democratize access to vast amounts of data by enabling intuitive, natural language interaction. In the domain of environmental, social, and governance (ESG) reporting, this is particularly relevant: while most available data is unstructured (e.g., textual reports), the most valuable insights are often contained in structured datasets.
This thesis aims to design and implement an LLM-based sustainability assistant that integrates a unique dataset: annual and sustainability reports of the 600 largest publicly listed corporations in Europe (2014–2023), combined with more than 2.8 million ESG indicators extracted from these reports. The student will explore how user interactions differ when querying structured data, unstructured data, or a hybrid retrieval-augmented generation (RAG) system. Additionally, the thesis will investigate methods for verifying and validating the assistant’s outputs and their impact on decision-making.
Aim
- Design an LLM-based Sustainability Assistant building on the given ESG dataset
- Explore different explanation approaches
- Design and conduct an empirical evaluation of different variations of the assistant
- Analysis of quantitative results
- Possibility for a joint research publication
Requirements
- Good programming skills in Python
- Experience with LLMs/RAG
- Interest in Human-Computer Interaction/Human-AI Collaboration
- Interest in quantitative research
- Strong time management and communication skills, and proficiency in English