Research Project Title:
Generating Videos from an Image and a Sentence
abstract:The advent of deep learning has led to dramatic improvements in algorithms for conditional image generation. Yet, the problem of creating videos from images remains extremely challenging—most existing methods enforce strong priors on video datasets, require multiple frames as input, or generate videos that are blurry and incoherent. To overcome these limitations, I am exploring a new, multimodal domain of video generation where models are conditioned on one image—which sets the scene—and a sentence—which specifies how the video should unfold by describing the actions of objects in the image. My goal is to develop neural networks for use in this setting and train them to synthesize coherent, high quality video.
Most of my prior experience in vision and deep learning came from UROPs and an internship at NVIDIA. The 2018 version of 6.883—which provided a critical analysis of popular evaluation metrics for deep generative models—also helped prepare me to do competent research in the field. I joined the SuperUROP program as it provides a great opportunity to get more experience implementing and critically evaluating novel architectures.