Introduction

Large Language Models (LLMs) have the potential to democratize access to vast amounts of data by enabling intuitive, natural language interaction. In the domain of environmental, social, and governance (ESG) reporting, this is particularly relevant: while most available data is unstructured (e.g., textual reports), the most valuable insights are often contained in structured datasets.
This thesis aims to design and implement an LLM-based sustainability assistant that integrates a unique dataset: annual and sustainability reports of the 600 largest publicly listed corporations in Europe (2014–2023), combined with more than 2.8 million ESG indicators extracted from these reports. The student will explore how user interactions differ when querying structured data, unstructured data, or a hybrid retrieval-augmented generation (RAG) system. Additionally, the thesis will investigate methods for verifying and validating the assistant’s outputs and their impact on decision-making.

Aim

Design an LLM-based Sustainability Assistant building on the given ESG dataset
Explore different explanation approaches
Design and conduct an empirical evaluation of different variations of the assistant
Analysis of quantitative results
Possibility for a joint research publication

Requirements

Good programming skills in Python
Experience with LLMs/RAG
Interest in Human-Computer Interaction/Human-AI Collaboration
Interest in quantitative research
Strong time management and communication skills, and proficiency in English