У нас вы можете посмотреть бесплатно [Podcast] The Gradient Bottleneck или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
https://arxiv.org/pdf/2603.10145 Lost in Backpropagation: The LM Head Gradient Bottleneck This research paper identifies a critical optimization flaw in neural language models known as the gradient bottleneck. While the softmax bottleneck is typically viewed as a limit on model expressivity, the authors demonstrate that it also severely restricts training efficiency because the high-dimensional feedback from the vocabulary is compressed through a much smaller output layer. Their theoretical and empirical findings reveal that 95-99% of the gradient norm is lost during backpropagation, effectively discarding vital training signals and replacing them with noise. Experiments on 2B parameter models show that this bottleneck can slow training convergence by up to sixteen times and render simple patterns unlearnable as vocabulary size increases. Ultimately, the study suggests that the current design of language model heads is fundamentally inefficient, highlighting a need for new architectures that better preserve the flow of information during optimization. #deeplearning #ai #research