
Arul Kolla
Efficient Generative AI Inference via Nonlinear Kernel Approximation
2025–2026
Electrical Engineering and Computer Science
- AI and Machine Learning
Chandrakasan, Anantha P.
Modern accelerators excel at linear operations, but nonlinearities take up a significant portion of compute for both datacenter and edge deployments. We propose a co-design approach that (1) develops approximation algorithms for key nonlinear functions in generative models and (2) maps these approximations to efficient hardware on server/edge GPUs and domain-specific accelerators. We will compare families of approximations and study their effects on accuracy, fine-tuning convergence, and robustness under distribution shift. On the hardware side, we will analyze how each method composes with existing compute primitives and propose microarchitectural extensions to reduce latency, area, and energy. Our evaluation combines analytical models with prototype implementations to quantify trade-offs across area, delay, and energy. The outcome is a systematic recipe for algorithm–hardware co-design that closes the nonlinear bottleneck in large-scale AI systems.
Through this SuperUROP, I aim to translate theory into working systems by prototyping hardware-friendly approximations for nonlinear layers and validating them end-to-end on real-world models. I’m excited to apply my knowledge of ML from classes like Hardware Architecture for Deep Learning (6.5930) to real-world systems. My goal is to publish practical techniques that accelerate both datacenter and edge inference.