Tzu-Hsien  Chan

MIT EECS - Quanta Computer Undergraduate Research and Innovation Scholar

Scaling WAMI: Enabling Large Scale Deployments of Web Accessible Multimodal Interfaces to Computers and Mobile Devices




James Glass


This project seeks to enable a more robust and wide-scale deployment of the WAMI framework, a JavaScript API for speech recognition applications across a widerange of devices and environments. WAMI is currently limited in its performance in mobile environments such as smartphones, tablets, and other cellular devices. In addition, WAMI supports only basic interaction paradigms. For instance, it does not currently support a microphone-always-on configuration, requiring users to manually indicate when they are speaking. The project aims to address these issues by exploring solutions such as a Node.js server implementation and voice activity detection (VAD) algorithms. Evaluation will be through building WAMI-based applications such as crowd-sourced speech recognition (Amazon Mechanical Turk) and cellular-based spoken dialogues (smartphones).


Ive worked with Prof. Hiroshi Ishii at the MIT Media Lab on an augmented reality ping pong table. Ive worked at, the largest online study website, building educational tools for the masses.

