У нас вы можете посмотреть бесплатно Mamba and S4 Explained: Architecture, Parallel Scan, Kernel Fusion, Recurrent, Convolution, Math или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
Explanation of the paper Mamba: Linear-Time Sequence Modeling with Selective State Spaces In this video I will be explaining Mamba, a new sequence modeling architecture that can compete with the Transformer. I will first start by introducing the various sequence modeling architectures (RNN, CNN and Transformer) and then deep dive into State Space Models. To fully understand State Space Models, we need to have some background in differential equations. That's why, I will provide a brief introduction to differential equations (in 5 minutes!) and then proceed to derive the recurrent formula and the convolutional formula from first principles. I will also prove mathematically (with the help of visual diagrams) why State Space Models can be run as a convolution. I will explain what is the HIPPO matrix and how it can help the model "memorize" the input history in a finite state. In the second part of the video, I will explore Mamba and in particular the Selective Scan algorithm, but first explaining what is the scan operation and how it can be parallelized, and then showing how the authors further improved the algorithm with Kernel Fusion and activations recomputation. I will also provide a brief lesson on the memory hierarchy in the GPU and why some operations may be IO-bound. In the last part of the video we will explore the architecture of Mamba and some performance results to compare it with the Transformer. Slides PDF and Parallel Scan (excel file): https://github.com/hkproj/mamba-notes Chapters 00:00:00 - Introduction 00:01:46 - Sequence modeling 00:07:12 - Differential equations (basics) 00:11:38 - State Space Models 00:13:53 - Discretization 00:23:08 - Recurrent computation 00:26:32 - Convolutional computation 00:34:18 - Skip connection term 00:35:21 - Multidimentional SSM 00:37:44 - The HIPPO theory 00:43:30 - The motivation behind Mamba 00:46:56 - Selective Scan algorithm 00:51:34 - The Scan operation 00:54:24 - Parallel Scan 00:57:20 - Innovations in Selective Scan 00:58:00 - GPU Memory Hierarchy 01:01:23 - Kernel Fusion 01:01:48 - Activations recomputation 01:06:48 - Mamba architecture 01:10:18 - Performance considerations 01:12:54 - Conclusion