Research
Our research program centers on LLM evaluation, AI safety, mechanistic interpretability, and human-LLM interaction. We combine mathematical rigour with empirical investigation to advance our understanding of how large language models process information, solve tasks, and interact with humans in real-world contexts.
LLM Evaluation
We study the science of LLM evaluation, utilising systematic reviews and statistical modelling to ground and quantify the measurement validity of LLM benchmarks. We develop novel evaluation settings and frameworks for assessing the limits of LLM reasoning capabilities in adversarial domains, including low-resource language and interactive scenarios.
AI Safety
We investigate AI safety mechanisms, including toxicity reduction approaches, constitutional AI methods, and understanding the neural mechanisms behind safety fine-tuning algorithms like Direct Preference Optimization (DPO).
Mechanistic Interpretability
We develop methods to understand the internal mechanisms of large language models, including sparse autoencoders, steering vectors, neuron-level analysis, and techniques to decode how these systems process and represent information.
Human-LLM Interaction
We conduct large-scale empirical studies examining how humans interact with AI systems for decision-making, including our landmark study of 1,300 participants exploring LLM use in medical self-diagnosis and healthcare applications.