У нас вы можете посмотреть бесплатно #297 или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
Over more than a decade, there has been an extensive research effort on how to effectively utilize recurrent models and attentions. While recurrent models aim to compress the data into a fixed-size memory (called hidden state), attention allows attending to the entire context window, capturing the direct dependencies of all tokens. This more accurate modeling of dependencies, however, comes with a quadratic cost, limiting the model to a fixed-length context. The authors present a new neural long-term memory module that learns to memorize historical context and helps attention attend to the current context while utilizing long past information. They show that this neural memory has the advantage of fast parallelizable training while maintaining fast inference. From a memory perspective, they argue that attention, due to its limited context but accurate dependency modeling, performs as a short-term memory, while neural memory, due to its ability to memorize the data, acts as a long-term, more persistent memory. Based on these two modules, they introduce a new family of architectures called Titans and present three variants to address how one can effectively incorporate memory into this architecture. Their experimental results on language modeling, common-sense reasoning, genomics, and time series tasks show that Titans are more effective than Transformers and recent modern linear recurrent models. They further demonstrate that Titans can effectively scale to context window sizes larger than 2M with higher accuracy in needle-in-haystack tasks compared to baselines. In this video, I talk about the following: What is the difficulty in modeling long context in Transformers? How does neural long-term memory module memorize at test time? How do Titan models incorporate long term and persistent memory into Transformers? How do Titan models perform? For more details, please look at https://arxiv.org/pdf/2501.00663 Behrouz, Ali, Peilin Zhong, and Vahab Mirrokni. "Titans: Learning to memorize at test time." NeuRIPS (2025). Thanks for watching! LinkedIn: http://aka.ms/manishgupta HomePage: https://sites.google.com/view/manishg/