Isha Agarwal
MIT EECS | Undergraduate Research and Innovation Scholar
A Mechanistic Analysis of LLM & User Persona Interaction Bias
2025–2026
Electrical Engineering and Computer Science
- AI and Machine Learning
Marzyeh Ghassemi
Personas, simulated roles assigned to Large Language Models (LLMs), have emerged as a useful tool for task personalization as the popularity of LLMs has risen. In addition to adopting their own personas, LLMs have also been shown to maintain and update an internal user model based on user inputs. While prior work has studied these aspects of personalization in isolation, we propose that the interaction between the LLM persona and its internal user model captures crucial behaviors and biases. We use linear probing and representation analysis to explore how interactions between the LLM persona and user profiles are embedded in each layer of the network. Through this analysis, we demonstrate that the interaction between the persona and user model impacts both the internal LLM processing and downstream responses to user queries. Furthermore, we provide evidence that, based on user interactions, an LLM changes how it internally represents and relates its own personas. These results underscore the potential harms and biases that LLMs propagate in trying to maximize personalization.
By participating in SuperUROP, I hope to gain more hands-on experience completing an end-to-end research project. I’m particularly excited to apply my prior research experience in mechanistic interpretability from UROPs and summer programs to debiasing research. I look forward to learning more about interpretability and bias in AI while making a meaningful contribution in this space.
