David Couto
MIT EECS Undergraduate Research and Innovation Scholar
Prosodic Structures Influence in Speech and Landmark Detection
2013–2014
Prosody is a broad but important part of speech that helps convey emotion and flow in a string of words and can place an utterance in a context. It describes the accent and word grouping patterns seen in spoken language, and when translated into the realm of signal processing, these features are seen in acoustics in the pitch, duration, amplitude, and signal quality of the recorded voice. This project will include the expansion of current prosodic modules to include such events as irregular pitch period, which causes the variation in word endings in different contexts. An extension to the project will be tying together the many independent modules of the Speech and Communication Groups speech recognition project into a complete, functional system.
After working for instrumentation companies specializing in real-time- data-processing of X-ray and microwave signals, I wanted to apply similar techniques to music. I was interested in making music production faster by only using my voice and began doing work in the area of automatic note transcription. This requires understand the human voice. Speech recognition was the logical place to begin this understanding.