Isha Agarwal

agarwali@mit.edu

Scholar Title

MIT EECS | Undergraduate Research and Innovation Scholar

Research Title

A Mechanistic Analysis of LLM & User Persona Interaction Bias

Cohort

2025–2026

Department

Electrical Engineering and Computer Science

Research Areas

AI and Machine Learning

Supervisor

Marzyeh Ghassemi

mghassem@mit.edu

Abstract

Personas, simulated roles assigned to Large Language Models (LLMs), have emerged as a useful tool for task personalization as the popularity of LLMs has risen. In addition to adopting their own personas, LLMs have also been shown to maintain and update an internal user model based on user inputs. While prior work has studied these aspects of personalization in isolation, we propose that the interaction between the LLM persona and its internal user model captures crucial behaviors and biases. We use linear probing and representation analysis to explore how interactions between the LLM persona and user profiles are embedded in each layer of the network. Through this analysis, we demonstrate that the interaction between the persona and user model impacts both the internal LLM processing and downstream responses to user queries. Furthermore, we provide evidence that, based on user interactions, an LLM changes how it internally represents and relates its own personas. These results underscore the potential harms and biases that LLMs propagate in trying to maximize personalization.

Quote

By participating in SuperUROP, I hope to gain more hands-on experience completing an end-to-end research project. I’m particularly excited to apply my prior research experience in mechanistic interpretability from UROPs and summer programs to debiasing research. I look forward to learning more about interpretability and bias in AI while making a meaningful contribution in this space.

Back to Scholars