Skip to main content
 

Cooking by the Book: Using Knowledge from the Web to Improve Automatic Video Captioning

Undergraduate: Adam Aji


Faculty Advisor: Tamara Berg
Department: Computer Science


Automatic video captioning is a task in which a machine must generate a sentence describing a given video sequence in a natural language (e.g., English) without human intervention. This can be useful for describing situations in which users cannot perceive a scene visually. Advances in artificial intelligence, computer vision, and natural language processing have made image and video captioning somewhat possible, but these generated captions still lack specificity and accuracy. I present work towards improving methods in video captioning by incorporating domain knowledge which can be learned from the Web. The task is isolated to videos in the cooking domain; there are a variety of sources of related knowledge on the Web from which we can draw information, such as cooking recipes and instructional videos. From these, we can computationally learn temporal relationships about states of cooking ingredients (e.g., a whole apple precedes a sliced apple) and visual models (e.g., an apple can be red or white depending on its state), and use both to improve the representation of the video for captioning.

 

Leave a Reply

You must be logged in to post a comment.