У нас вы можете посмотреть бесплатно Fully Fused Map, Reduce And Scan Cuda Kernels In Spiral или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
After finishing the matrix multiplication, we need some simpler kernels that we can use to implement activation functions and bias additions for our ML library, and in this video we do exactly that. Want to know how implement layer normalization, softmax and discrete distribution samplers in the most elegant way possible? Try this video. --- #spiral #functionalprogramming #machinelearning #reinforcementlearning #programming #cpp #programminglanguage #compiler #parallelprogramming #cuda #gpu Playlist(Staged FP in Spiral): • Staged Functional Programming In Spiral Playlist(ML Library): • Spiral's ML Library Spiral: https://github.com/mrakgr/The-Spiral-... Github: https://github.com/mrakgr/ If you have interesting work opportunities and require an expert functional programmer, don't hesitate to get in touch. My email is on my Github profile. Put "Work" as the subject in order to avoid the spam filters. Music: 00:10:31:22 Knights of Round 4 01:41:00:00 Sonic Hybrid Orchestra - 東方の嵐 -TOHO TEMPEST [東方不敗小町2] 02:50:21:00 TAMusic - Touhou Violin 1 03:33:01:21 彩音 ~xi-on~ - 東方志奏 1st Spell Airstream 04:01:51:07 彩音 ~xi-on~ - 東方志奏 2nd Spell Fullmoon TOC: 00:00:00 - Start 00:00:06 - Intro - What are our plans 00:08:51 - According to rumors the new Nvidia cards should be surprisingly cheap 00:10:32 - The map kernel is easy - Timelapse - 5m 00:14:33 - In order to use register memory, the loops need to be fully unrolled 00:15:49 - In order to take full advantage of the GPU, our ML library will split the work into blockwise chunks 00:24:04 - Subgroup ranges - Timelapse - 25m 00:33:41 - Starting work on the reduction 00:35:30 - Dealing with the subgroups 00:40:36 - Implementing reduce_2d 00:49:06 - Interlude - Explanation of the warp reduction 00:59:56 - Inclusive and exclusive scans - Timelapse - 6m 01:05:40 - Interlude - How the inclusive scans works 01:17:33 - Fixing the exclusive_scan in the Cuda library - Timelapse - 2:36h 01:23:17 - Testing and debugging the new primitives 01:45:57 - reduce 02:05:37 - Fused kernel - map_reduce_replicate_map_2d 02:31:23 - The softmax and the DSL for local 2d operations 03:07:26 - Implementing local_scan 03:26:53 - Sampling from discrete probability distributions for action selection 03:44:32 - We make it look easy 03:45:24 - Have we done the rigid_merge correctly? 03:47:35 - Adapting the matrix multiplication kernel for block processing 03:54:11 - Interlude 03:58:04 - Sketching out the fundamentals in code