Reasoning with Machines Lab - University of Oxford

Research

Our research program centers on LLM evaluation, AI safety, mechanistic interpretability, and human-LLM interaction. We combine mathematical rigour with empirical investigation to advance our understanding of how large language models process information, solve tasks, and interact with humans in real-world contexts.

LLM Evaluation

We study the science of LLM evaluation, utilising systematic reviews and statistical modelling to ground and quantify the measurement validity of LLM benchmarks. We develop novel evaluation settings and frameworks for assessing the limits of LLM reasoning capabilities in adversarial domains, including low-resource language and interactive scenarios.

AI Safety

We investigate AI safety mechanisms, including toxicity reduction approaches, constitutional AI methods, and understanding the neural mechanisms behind safety fine-tuning algorithms like Direct Preference Optimization (DPO).

Mechanistic Interpretability

We develop methods to understand the internal mechanisms of large language models, including sparse autoencoders, steering vectors, neuron-level analysis, and techniques to decode how these systems process and represent information.

Human-LLM Interaction

We conduct large-scale empirical studies examining how humans interact with AI systems for decision-making, including our landmark study of 1,300 participants exploring LLM use in medical self-diagnosis and healthcare applications.

Research

LLM Evaluation

AI Safety

Mechanistic Interpretability

Human-LLM Interaction

Member Name