Alicia Li

aliciali@mit.edu

Scholar Title

MIT EECS | Nadar Foundation Undergraduate Research and Innovation Scholar

Research Title

Chunked Bidirectional Attention Transformers for Long Context Reasoning

Cohort

2025–2026

Department

Electrical Engineering and Computer Science; Mathematics

Research Areas

AI and Machine Learning
Natural Language and Speech Processing

Supervisor

Yoon Kim

yoonhkim@mit.edu

Abstract

Planning and sequential decision-making remain fundamental challenges for large language models (LLMs). Prior work has shown that vanilla transformer architectures are limited in state-tracking tasks, and that chain-of-thought (CoT) reasoning, while improving expressivity, incurs inference-time costs at long horizons. Bidirectional attention offers a promising alternative by producing richer contextual representations, but existing efficient implementations restrict bidirectional context to the prompt region, leaving reasoning traces causally attended. We propose chunked bidirectional attention, a novel architecture that extends bidirectional representations into the reasoning trace by recomputing full bidirectional attention once per fixed-size chunk rather than at every token. This design achieves enhanced expressivity with only constant-factor overhead during training and minimal overhead at inference. We initialize from pretrained Qwen3-0.6B weights and fine-tune on the OpenMathReasoning dataset, expecting our architecture to outperform standard fine-tuning of the base causal transformer on complex reasoning benchmarks.

Quote

I’m participating in SuperUROP because I want to work on novel NLP architectures. Specifically, I’m interested in planning capabilities, which is inspired by my previous UROP in robot planning and learning. I’m excited to learn more about NLP architectures and hoping to publish a paper.

Back to Scholars