У нас вы можете посмотреть бесплатно ELLIS Unit Stuttgart - Distinguished Lecture Series - Talk by Frank Keller или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
"Grounding Across Modalities and Domains". Talk by Prof. Dr. Frank Keller (University of Edinburgh). May 7, 2025. In order to understand or generate multimodal inputs, AI systems must perform grounding – the process of linking entities or actions across different modalities. For example, objects depicted in images and videos need to be associated with corresponding textual references. However, large language models struggle with grounding, limiting their performance in tasks such as image generation and video understanding. In this talk, I will present two case studies demonstrating how explicit grounding can enhance multimodal AI. First, I will argue that character grounding is essential for visual storytelling – the Task of turning a sequence of images into a coherent narrative. I will introduce a model that generates visually grounded stories by Building coreference chains for characters across images and text, leading to stories that are more specific, coherent, and engaging. The second case study focuses on understanding instructional videos, such as those demonstrating cooking or home improvement tasks. In this domain, entities are often implicit (not mentioned in text) and frequently change (being merged, separated, or transformed), making grounding particularly challenging. I will present models that address this challenge by computing the semantic roles of both explicit and implicit entities and tracking them across instructional steps, evenas they undergo transformations. These models enhance procedural understanding, improving AI’s ability to follow and reason about complex tasks.