Alicia Li
MIT EECS | Nadar Foundation Undergraduate Research and Innovation Scholar
Chunked Bidirectional Attention Transformers for Long Context Reasoning
2025–2026
Electrical Engineering and Computer Science; Mathematics
- AI and Machine Learning
- Natural Language and Speech Processing
Yoon Kim
Planning and sequential decision-making remain fundamental challenges for large language models (LLMs). Prior work has shown that vanilla transformer architectures are limited in state-tracking tasks, and that chain-of-thought (CoT) reasoning, while improving expressivity, incurs inference-time costs at long horizons. Bidirectional attention offers a promising alternative by producing richer contextual representations, but existing efficient implementations restrict bidirectional context to the prompt region, leaving reasoning traces causally attended. We propose chunked bidirectional attention, a novel architecture that extends bidirectional representations into the reasoning trace by recomputing full bidirectional attention once per fixed-size chunk rather than at every token. This design achieves enhanced expressivity with only constant-factor overhead during training and minimal overhead at inference. We initialize from pretrained Qwen3-0.6B weights and fine-tune on the OpenMathReasoning dataset, expecting our architecture to outperform standard fine-tuning of the base causal transformer on complex reasoning benchmarks.
I’m participating in SuperUROP because I want to work on novel NLP architectures. Specifically, I’m interested in planning capabilities, which is inspired by my previous UROP in robot planning and learning. I’m excited to learn more about NLP architectures and hoping to publish a paper.
